Saturday, July 17, 2010

Tatoeba update (Jul 17th, 2010)

First of all, I'd like to mention that we've had a lot of traffic lately. Allan published an article on about Tatoeba, and it sure brought a lot of new people :D
Google Analytics says 1,172 unique visitors on July 17th, while we usually have around 400-450. We're glad to see the server is still doing well despite the quite significant increase of activity!

What's new

We can now import sentences. Since July 4th actually, but I didn't have much time to write about it. The feature is currently only available for moderators, because we cannot safely let everyone import huge amount of data. So the way it works is that you send us your sentences in a simple text file, by email (, and we import it.

We accept two formats:
  1. Single sentences: each line has one sentence. All the sentences have to be in the same language.
  2. Sentences + translations: each line has a sentence and its translation, separated by a tab (sentence [tab] translation). All the sentences have to be in a same language, and all the translations in a same language. For instance only French-Spanish, and not French-Spanish in one line, and Swedish-Spanish the next line.
IMPORTANT: We release our data under the Creative Commons Attribution (CC-BY) license. We will not be importing your content if it brings up copyright issues or license incompatibilities. I mean, for instance don't send us sentences stripped from textbooks, or sentences that under the CC-BY-SA license (it's not compatible with CC-BY).

So far we imported:
  • ~700 pairs of sentences in Chinese-Shanghainese. In total we have ~900 pairs of sentences thanks to The first 200 ones were added by hand.
  • 200+ proverbs in Dutch.
  • 250+ proverbs in Ukrainian.
That's the major thing for the last couple of weeks.

What next?

We still have to import 2500+ pairs of English-Spanish sentences, provided by one of our registered users, Ɓukasz. And probably thousands and thousands of other sentences, as more and more people discover Tatoeba, and have their own private (or not so private) collections of sentences to share with everyone :)

In terms of features, there will not be much going on in the next couple of weeks. Actually it will depend on the rest of the team, but as far as I'm concerned, I will have other priorities.

There is still a lot of things that can be improved about the current features, and we will keep improving them, but in August we will also start discussing about the next new stuff. I will write more about it when we get there.

Right now I'd just like to say thank you to everyone who gave this project a little bit - or a lot - of their time, of their knowledge, of their encouragements... Because Tatoeba has become an awesome place for language lovers and learners, and for that, the credits really goes to the community :)