Sunday, November 21, 2010

Tatoeba update (Nov 21st, 2010)

Alright, it's been a long time since we last updated Tatoeba :) This is just a small update.

What's new

"Members" page. This is probably the main modification. We redesigned a little bit the "Members" page to look a bit better and to be less slow. We removed the information about the last login, because some people don't like being spied :P We removed the top 20 ranking because that's what makes the page so slow. Instead we're displaying the members who are currently active (those who participated to the few last hundreds contributions).

Tags info. If you hover your mouse over a tag, you will see the id of the user who added it, and the date when it was added. This is mostly useful for sentences owner, who may wonder why someone has tagged a sentence a certain way. You can figure out who's the user behind a certain id with the following URL:[id].

Set language to "unknown". We get requests for new languages quite frequently and we ask people to add a few sentences in the language they request. Except that the language is sometimes misdetected and there was no way to set the language to "unknown" (to indicate that it's a language that is not in the list). Now it's possible. There is an option called "other language", and will set the language icon to "unknown".

Sentence owner's name in comments. It was requested a long time ago, and it's finally here. The name of the sentence owner is now indicated in the comments, next to the sentence itself. This way, when you look at a comment on the homepage, you will not only know what sentence it is associated to, but also the user who added that sentence.

What next
  • We'll be working on a page that lists all sentences that were tagged @change and @delete more than 2 weeks ago. This way moderators will have a simple way to know what sentences they can/should take care of.
  • We'll be adding a page that lists all the Wall messages of a user.
  • And perhaps other random things...

Sunday, November 14, 2010

Tatoeba day & stats

Yesterday was our first Tatoeba day, so today I'm publishing stats about what has been been achieved that day, as well as more general stats.

Stats by language

The chart below shows the number of sentences added on Nov 13th for each language.

The gold medal goes to Arabic! Silver goes to Esperanto and bronze goes to German :)
  1. Arabic (573)
  2. Esperanto (354)
  3. German (247)
  4. Egyptian Arabic (230)
  5. Spanish (207)
  6. Italian (183)
  7. Chinese Mandarin (162)
  8. Hebrew (125)
  9. French (113)
  10. Ukrainian (105)
  11. Danish (100)
  12. Hungarian (78)
  13. Cantonese (78)
  14. English (73)
  15. Russian (70)
  16. Polish (45)
  17. Dutch (36)
  18. Old East Slavic (33)
  19. Lithuanian (18)
  20. Persian (17)
  21. Unknown language (10)
  22. Portuguese (8)
  23. Finnish (7)
  24. Latvian (4)
  25. Vietnamese (4)
  26. Czech (3)
  27. Swedish (3)
  28. Norwegian Bokmål (2)
  29. Shanghainese (2)
  30. Breton (1)
  31. Bulgarian (1)
  32. Catalan (1)
  33. Estonian (1)
  34. Japanese (1)
  35. Quechua (1)
  36. Slovak (1)
  37. Turkish (1)
  38. Uzbek (1)
Sadly, the record set on August 18th of 3465 sentences added was not broken. We only made it to 2899. It's still not bad though, since it's the 2nd most important day, in terms of sentences added (and by "sentences added" I mean "new sentences + translations").

We were missing a few of our devoted members that day, so I guess it's normal. Let's hope more people will be available for the next Tatoeba day :)

Stats by users

The chart below shows the number of sentences added (in green) and the number of sentences modified (in yellow) on Nov 13th, for the top 20 users. You'll excuse my laziness but I only used the number of sentences added for the rank.

Saeb wins the day, by far, with 802 sentences added! Congrats :D Second place goes to nickyeow, and third place goes to Eldad.

At any rate, everyone deserves a big thank you for their contributions! THANK YOU :)

  1. saeb (802/20)
  2. nickyeow (214/20)
  3. Eldad (166/17)
  4. aandrusiak (140/7)
  5. MUIRIEL (138/41)
  6. Guybrush88 (135/2)
  7. danepo (100/12)
  8. GrizaLeono (94/21)
  9. Shishir (94/12)
  10. Dejo (56/11)
  11. Archibald (54/32)
  12. darinmex (53/5)
  13. rado (52/2)
  14. Leono (51/10)
  15. esocom (51/4)
  16. Esperantostern (48/5)
  17. Muelisto (43/1)
  18. kroko (42/4)
  19. Dorenda (41/0)
  20. qdii (40/11)
  21. zipangu 37 2
  22. wondersz1 33 4
  23. Manfredo 27 1
  24. samueldora 24 2
  25. sysko 23 7
  26. szaby78 22 5
  27. Zifre 22 7
  28. cost (21/2)
  29. sencay (20/2)
  30. shanghainese (19/0)
  31. fanty (18/0)
  32. pliiganto (16/13)
  33. BraveSentry (15/1)
  34. pjer (14/5)
  35. U2FS (14/3)
  36. debian2007 (13/1)
  37. Gyuri (12/3)
  38. jxan (12/0)
  39. virgil (12/4)
  40. TRANG (11/32)
  41. slavneui (11/0)
  42. sarah (11/0)
  43. kebukebu (10/2)
  44. Wimmer (10/1)
  45. ae5s (10/0)
  46. Tonari (9/0)
  47. arashi_29 (9/5)
  48. Aleksej (7/0)
  49. CK (5/14)
  50. Shoyren (4/1)
  51. Holyspirit (3/0)
  52. JimBreen (2/0)
  53. luwenzhuo (2/0)
  54. CLARET (2/1)
  55. lajauge (1/0)
  56. ozma29 (1/0)
  57. sschlumberger (1/0)
  58. mr5 (1/0)
  59. Tenshi (1/0)

Language ranks

Tatoeba day is a good occasion to see how each language have progressed. You can see how each language with more than 1000 sentences was positioned one month ago, in this previous post. Let's how it is now...

Top 5

The top 5 hasn't changed.
  1. English - 158,000+. It looks like English has been growing a little bit.
  2. Japanese - 153,000+. Japanese is standing still. You can tell we don't have a very strong Japanese community.
  3. French - 53,000+. French seems keeps moving at a steady pace.
  4. Esperanto - 47,000+. Esperanto is catching up with French quickly...
  5. German - 32,000+. German is progressing better than French, but still not quite as well as Esperanto.
Other languages with 10,000+ sentences
  • Polish - 20,000+
  • Spanish - almost 19,000. Spanish gained one rank! :D
  • Russian - almost 18,000
  • Chinese Mandarin - almost 15,000
  • Ukrainian - 14,000+
Other languages with 1,000+ sentences
  • Italian - 8,500+
  • Arabic - 6,500+. Great boost for Arabic!
  • Dutch - almost 6,500
  • Portuguese - 6,000+
  • Hebrew - 4,500+. Great boost for Hebrew as well!
  • Icelandic - 4,000+
  • Hindi - almost 3,500
  • Hungarian - 3,000+. Hungarian joined the 1,000+ sentences club! Very good progress.
  • Turkish - 2,500+
  • Shanghainese - 2,500+
  • Uyghur - almost 2,500
  • Danish - 2,000+. Danish is new to the club with very good progress as well!
  • Vietnamese - 2,000+
  • Belarusian - almost 2,000
  • Norwegian Bokmål - 1,500+
  • Cantonese - 1,500+

Other numbers
  • 55,735 sentences added in October.
  • About 25,000 sentences added since the beginning of November.
  • We've reached 600,000 sentences in total today!
  • But there are probably thousands of duplicates, so it's not really 600,000 yet...
  • We will soon have 76 languages. 5 are waiting to be added: Galician, Irish, Interlingua, Lojban, Toki Pona. Note that the last 3 languages are constructed languages.

Next Tatoeba day

A potential date for the Tatoeba day would be December 11th. Although it could be December 18th as well. We'll see what suits best for everyone.

The main objective of the first Tatoeba day was to break the record of the highest number of sentences added in one day. We didn't break it, but it's okay because we still had fun :D

The main objective will be different for the second Tatoeba day. We haven't decided what it will be yet, but I think it would be nice to emphasize on adoption next time. Because unfortunately I didn't really have time to look at adoptions for this first Tatoeba day :(

Anyway, we'll keep you informed. Thanks again for everyone who participated and who came to our IRC channel :)

Sunday, November 7, 2010

Tags guidelines

We have introduced the "tags" feature several months ago and we've let trusted users experiment it pretty much freely. There has been a profusion of tags created but they are quite a mess and we decided to try tidying up.

From now on, if you are going to tag a sentence, please take into consideration the following things.

1. Use tags for objective and official information

We would like to keep the tags for "objective" and "official" information. If you want to categorize sentences for personal purpose, you should use lists.

For instance, you cannot tag a sentence "French exam" to mark the sentence as part of those you will use to practice before your French exam, you should create a list for that. We know lists are not as practical as tags, but we'll be improving the lists feature as soon as we have time.

2. Avoid creating new tags

Avoid creating new tags because it can make the cleaning process harder. If the tag you want to add doesn't appear in the autocompletion list, then it's a new tag, so don't add it unless you are really convinced it's a valid tag.

3. Ask before you create a new tag

We don't have clear rules yet for what is a valid tag and what is not, but one of our moderators (Swift) volunteered to take care of the tags. If you feel the need to create a new tag, it would be wise to ask Swift first. He will be officially in charge of tidying up the tags. He will be the one deciding what tag to keep or not and what tag to rename. Also, don't hesitate to contact him if you would like to help out. It's not easy to decide on these things.

4. Use English for tags, unless you really can't

We have decided to use English as the default language for tags. We will rename all non-English tags into their English equivalent, when it is possible. We can still accept non-English tags, but only if there is no English equivalent.

The point of having one common language is uniformity. It would be inefficient to have a bunch of sentences tagged "proverb" (English) and another bunch tagged "proverbe" (French). There is also no point having a sentence tagged with both "proverb" and "proverbe". They are the same notion. It can even make things confusing to have several tags to designate a same notion, that's why we have decided to have one default language. We will later implement the possibility to translate the tags and to display them in languages other than English.

5. How things are going to work
  • We'll try to keep the process as transparent as possible.
  • Swift will publish on the Wall the modifications that will be applied to the tags (i.e. renaming and deletions).
  • There will be a few days until these modifications are actually applied, in case people strongly disagree with a decision.
  • Swift will also add on his profile and his personal web page the links to every Wall post mentioning the modifications, for people to be able to trace back all the decisions about the tags.
  • If you need to protest against a decision, please refer to Swift.

Tatoeba day

Tatoeba has seen its community grow quite significantly in the past 6 months, and it's really encouraging. There was a suggestion about having a "Tatoeba day", a day where (passionate) members would try to contribute more passionately than ever. It's a very good idea so we'll be organizing one every month (we'll try to).

The first one will happen on Saturday November 13th, from 0:00 to 23:59 (France time).

Well, this is a virtual event, so it happens on the internet... BUT if you want to live this event at its fullest, come to our IRC channel on Nov 13th: #tatoeba, on freenode. Don't be shy! And even if you are shy, you can just drop by to read what's going on.

For the first Tatoeba day, we will start with something very basic. The goal of the day will be to translate, correct and adopt a lot of sentences sentences. Not that it's different from what's already happening every day, but I will publish detailed stats the following day, to give an idea of what has been achieved during those 24 hours.
  • How many sentences added for each language and each user
  • How many corrections made for each language and each user
  • How many sentences adopted for each language and each user

This event is of course an occasion to be more productive than we usually are, but it's mostly an occasion for members to feel more connected with each other and to have fun! You may also learn a few things about Tatoeba that you didn't know :)