Sunday, March 30, 2014

Tatoeba update (March 30, 2014): 15 new languages

We have just updated the website again. Tatoeba now has 15 new languages, for a total of 146. The new languages are:

- Amharic
- Awadhi
- Bhojpuri
- Chavacano
- Middle English
- Middle French
- Haitian Creole
- Juhuri (Judeo-Tat)
- Greenlandic
- Meadow Mari
- Nahuatl
- Pennsylvania German
- Sinhala
- Turkmen
- Wallon

Thank you to those who gave us the sentences and information to fulfill these requests. Note that the procedure for requesting a new language (which involves supplying at least five sentences in that language) can be found via the Tatoeba menu under "More"/"Tatoeba Wiki"/"How to Request a New Language", or at this link.

Sunday, March 23, 2014

Tatoeba update (March 23, 2014)

We are pleased to announce a set of updates to the site. In addition to the differences that you'll see when you visit the site, we have some major changes behind the scenes that make it easier for us to attract and work with developers around the world.

Functionality
  • Contributors can now edit their comments on sentences or the Wall.

User Interface
  • Added link to friendlier search instructions.
  • Improved UI text (fixed misspellings, etc.) in English.
  • Incorporated updates to UI translations from the past year or longer, most notably in Japanese and German (which is now 100% translated!).
  • Internationalized several strings so that they can now be translated.
  • Changed remaining references to "tatoeba.fr" into references to "tatoeba.org".
  • Renamed "Modern Greek" to "Greek".

Security
  • Empty passwords are no longer accepted.

Usability
  • Now accepts profile photos with uppercase file extensions as well as lowercase.

Development
  • Moved repository from Subversion on Assembla to Git on GitHub.
  • Added scripts for adding languages and incorporating updated translations.
  • Fixed various issues that appeared on developers' machines.

Indexing
  • All sentences have been indexed, so they will appear in the search results.

Even more important than the changes to the code is the fact that the team behind it is stronger and more responsive than it has been in a long time. We are especially looking forward to working with our Google Summer of Code participants, once we know who they will be.

Whether you are interested in contributing sentences, translating the user interface, developing code, testing the site, or all of the above, we hope you will join the team!

Saturday, March 1, 2014

Why We Need You to Help Beyond Adding Sentences

al_ex_an_der wrote: "I'd find it helpful if you could explain if possible in plain English why a newly added sentence can be found by Google already one minute later but by Tatoeba only one month later." I thought this was worth some discussion in a thread of its own.

First of all, I did a little experiment to determine whether a Google search for a word contained in a sentence that I had added a minute earlier really would succeed. Answer: no, though in one case, it remarkably took only about fifteen minutes before a search ("incontrovertible site:tatoeba.org") found it. But searches for words that I added in sentences seventeen hours and one hour ago came up empty.

To address Alexander's larger point: Why is it that Google indexes words so quickly, and Tatoeba takes so long? It comes down to differences in the hardware, software, human resources, and project management available to Google (a corporation with US$59 billion of revenue in 2013) and Tatoeba (a nonprofit whose budget is somewhat smaller). Google has vast "farms" of machines. Tatoeba has one. Even two machines would be a big improvement because one could index while the other was still actively handling requests and adding sentences. Getting from one to two, however, requires more funding, which demands organization, not just in terms of assembling a proposal for a grant or plans for fundraising, but for putting the money to use if and when it actually comes through. It also requires someone to write the code that can handle interaction between two computers operating in parallel. Software can accomplish what seems like magic, but it's not written by magic.

Tatoeba.org can never hope to replicate the money or machines that Google.com has at its disposal, but we can do a far better job (even beyond the impressive things we already do) if we get a lot more participation in everything that makes the site run, beyond the operations of adding, commenting on, modifying, and deleting sentences. Many of the people essential to Google are not software developers, and much of what we're missing at Tatoeba can be provided by people who are not developers, either.

In my last long post, I called for volunteers for testing, either at a high level (putting together a test plan and coordinating other volunteers), or simply working through some screens and determining whether they work. I also asked for someone to coordinate the translators who work on the code at Launchpad. Of course, I would have been glad if someone proposed to help in some way that I didn't even mention. But I was disappointed that no one responded at all. I want you to understand why people stepping up to help are not just nice to have, but essential.

We've undergone some changes in the way we store code, and we need to undergo some changes in the way we put it on the server. If we don't test before and after we make these changes, we could easily break something without knowing that we've broken it. But testing takes time. If I am responsible for doing every level of test planning and testing, as well as planning how to move the code without losing anything, it will take weeks longer to get to the point where we can move it. It will also become likely that something else will change in the interim, so we'll have to begin the cycle again without making any progress.

People who work at Google are motivated by some combination of enjoyment of the tasks involved in their jobs, satisfaction from accomplishing the assignments that they're not initially able to do, and financial incentives for doing their work. Their jobs require them to learn new skills and to do what has to be done, not just what they know they already enjoy doing.

Tatoeba can't provide financial incentives, but we can give you everything else, including the chance to move beyond what you already know you can do to tackle what has to be done (write up a test plan, collect bug reports and enhancement requests from the Wall, fix code written in PHP even if your favorite programming language is Python), and feel proud of what you've accomplished. You can also feel pleased that you're keeping Tatoeba going so that you can continue to add sentences to the corpus.

There is one more reason why we need a coordinated team to connect the gaps: You don't want anyone to burn out because they're asked to do too much. We all have commitments, and are limited to how much time we can contribute. If someone senses that he or she doesn't have enough time to do a job right, they'll drop out entirely. Let's make sure that we take full advantage of the incredibly talented people who've gotten us this far, and those who have yet to join us, by making sure that all the pieces fit together.

Please send me a note telling me how you'd like to help. Many thanks!