Saturday, March 1, 2014

Why We Need You to Help Beyond Adding Sentences

al_ex_an_der wrote: "I'd find it helpful if you could explain if possible in plain English why a newly added sentence can be found by Google already one minute later but by Tatoeba only one month later." I thought this was worth some discussion in a thread of its own.

First of all, I did a little experiment to determine whether a Google search for a word contained in a sentence that I had added a minute earlier really would succeed. Answer: no, though in one case, it remarkably took only about fifteen minutes before a search ("incontrovertible site:tatoeba.org") found it. But searches for words that I added in sentences seventeen hours and one hour ago came up empty.

To address Alexander's larger point: Why is it that Google indexes words so quickly, and Tatoeba takes so long? It comes down to differences in the hardware, software, human resources, and project management available to Google (a corporation with US$59 billion of revenue in 2013) and Tatoeba (a nonprofit whose budget is somewhat smaller). Google has vast "farms" of machines. Tatoeba has one. Even two machines would be a big improvement because one could index while the other was still actively handling requests and adding sentences. Getting from one to two, however, requires more funding, which demands organization, not just in terms of assembling a proposal for a grant or plans for fundraising, but for putting the money to use if and when it actually comes through. It also requires someone to write the code that can handle interaction between two computers operating in parallel. Software can accomplish what seems like magic, but it's not written by magic.

Tatoeba.org can never hope to replicate the money or machines that Google.com has at its disposal, but we can do a far better job (even beyond the impressive things we already do) if we get a lot more participation in everything that makes the site run, beyond the operations of adding, commenting on, modifying, and deleting sentences. Many of the people essential to Google are not software developers, and much of what we're missing at Tatoeba can be provided by people who are not developers, either.

In my last long post, I called for volunteers for testing, either at a high level (putting together a test plan and coordinating other volunteers), or simply working through some screens and determining whether they work. I also asked for someone to coordinate the translators who work on the code at Launchpad. Of course, I would have been glad if someone proposed to help in some way that I didn't even mention. But I was disappointed that no one responded at all. I want you to understand why people stepping up to help are not just nice to have, but essential.

We've undergone some changes in the way we store code, and we need to undergo some changes in the way we put it on the server. If we don't test before and after we make these changes, we could easily break something without knowing that we've broken it. But testing takes time. If I am responsible for doing every level of test planning and testing, as well as planning how to move the code without losing anything, it will take weeks longer to get to the point where we can move it. It will also become likely that something else will change in the interim, so we'll have to begin the cycle again without making any progress.

People who work at Google are motivated by some combination of enjoyment of the tasks involved in their jobs, satisfaction from accomplishing the assignments that they're not initially able to do, and financial incentives for doing their work. Their jobs require them to learn new skills and to do what has to be done, not just what they know they already enjoy doing.

Tatoeba can't provide financial incentives, but we can give you everything else, including the chance to move beyond what you already know you can do to tackle what has to be done (write up a test plan, collect bug reports and enhancement requests from the Wall, fix code written in PHP even if your favorite programming language is Python), and feel proud of what you've accomplished. You can also feel pleased that you're keeping Tatoeba going so that you can continue to add sentences to the corpus.

There is one more reason why we need a coordinated team to connect the gaps: You don't want anyone to burn out because they're asked to do too much. We all have commitments, and are limited to how much time we can contribute. If someone senses that he or she doesn't have enough time to do a job right, they'll drop out entirely. Let's make sure that we take full advantage of the incredibly talented people who've gotten us this far, and those who have yet to join us, by making sure that all the pieces fit together.

Please send me a note telling me how you'd like to help. Many thanks!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.