Friday, December 10, 2010

Tatoeba update (Dec 10th, 2010)

What's new

Sentences stats. There's now a specific page for the sentences stats, to make them a bit more readable. The total number of sentences is also now indicated (it's a quite important number, but for some reason we never displayed it anywhere).

Wall messages of a user. You can browse the messages that were posted by a specific user, from the user profil. Click on "See this user's contribution", scroll to the bottom of the page. You will see the latest messages posted by the user, and a link to view them all (if the user has posted any message).

Sentences tagged more than 2 weeks ago. That's useful mostly for moderators :)

New languages. We've added Ainu, Malayalam, Low German and Sicilian.

FAQ. In case you haven't noticed, the procedure to request a new language was updated (several weeks ago), and we added a new question, regarding audio.


What next
  • Improvement of the profile page. Because the way one can edit his profile at the moment is not really the most intuitive, nor practical.
  • Very certainly other things but I can't tell what yet because it will depend on my inspiration...

11 comments:

  1. I would like very much, if you could solve the problem that some translated sentences are not visible (if linked by more than one sentence in between) - so people translate again and again. And we get more and more duplicates.

    I made two suggestions - one being that you show the number of sentences linked to a given sentence. See http://tatoeba.org/epo/wall/show_message/4427#message_4427

    The other suggestion to identify the chain and the language, http://tatoeba.org/epo/wall/show_message/4433#message_4433

    Ludoviko
    http://tatoeba.org/epo/user/profile/ludoviko

    ReplyDelete
  2. Do you need some explanation about my proposition?

    Ludoviko
    http://tatoeba.org/epo/user/profile/ludoviko

    ReplyDelete
  3. No, it's okay, I understood perfectly what you meant :)

    But like sysko said in his reply, these are not things we can easily do in the current system :( I mean, it's doable, but not without taking a lot of resources. This is why these features are not something you will see before we release the next version, where we will switch to a new kind of database that can handle (in real time) the kind of query you are asking for without making our server crash ^^

    But what I can do for now (until the new version is out) is trying to make it easier to link sentences. If more sentences are linked, it will reduce the number of "hidden translations".

    ReplyDelete
  4. I understand that the server and the system are not sufficient to solve the problem in real time. That's why I proposed two solutions working independently of the actual moment:

    - Calculate the number of sentences linked in a chain/graph and show it with every sentence in the graph.

    - Give a second identification to every sentence which denotes the translation chain (graph) and the language.

    Both of these solutions can be done at any moment.

    ReplyDelete
  5. Well, even if it can be done at any moment, it wouldn't be easy to implement and we would have to make changes in the database to store this new data. Considering that we have a new system in progress, it wouldn't be worth having sysko or myself spending time on this and it wouldn't be worth changing the database for this purpose =/

    However, I've thought of something else.

    We provide download files that are updated weekly:
    http://tatoeba.org/download_tatoeba_example_sentences

    Someone who has programming skills can download the links.csv file and calculate the number of 'hidden translations' for each sentence. They would only keep the sentences that have at least one hidden translation. They could publish somewhere the results with the following format:

    sentence nº23455-Polish (3 hidden translations): 12567-Finnish, 14536-Russian, 18999-Thai

    It would give us information on how many sentences have hidden translation. Maybe it's a lot, or maybe not that much. Right now I personally have no idea how many there could be. In any case, once we have this information, we can regularly work on linking hidden translations to reduce the chances of people adding translations that already exist.

    ReplyDelete
  6. Three days ago sysko wrote that the new release will show more or less all translations - so it seems the problem will be solved some day.

    I tried to find out how many sentences there are with hidden translations. So I went to http://tatoeba.org/epo/sentences/show_all_in/eng/none/none/indifferent/page:10000 and tried them. Out of the ten there are hidden sentences (can not be viewed from the English version) for
    http://tatoeba.org/epo/sentences/show/249696
    http://tatoeba.org/epo/sentences/show/249692

    Hidden sentences for the
    - German version (and others) of http://tatoeba.org/epo/sentences/show/249690
    - Polish versions of http://tatoeba.org/epo/sentences/show/249689
    - Persian version of http://tatoeba.org/epo/sentences/show/249684


    No:
    http://tatoeba.org/epo/sentences/show/249694
    http://tatoeba.org/epo/sentences/show/249691
    http://tatoeba.org/epo/sentences/show/249687
    http://tatoeba.org/epo/sentences/show/249686
    http://tatoeba.org/epo/sentences/show/249685

    This may mean that half of the graphs with English sentences have hidden sentences. The ones without were more or less those with only two or three translations.

    But, we have to consider that there are about 160.000 sentences in English and only 60.000 or less in the other languages. So I tried the same with French, with http://tatoeba.org/epo/sentences/show_all_in/fra/none/none/indifferent/page:1000

    Hidden sentences are in the graphs of
    http://tatoeba.org/epo/sentences/show/542849
    http://tatoeba.org/epo/sentences/show/542839
    http://tatoeba.org/epo/sentences/show/542836
    http://tatoeba.org/epo/sentences/show/542835
    http://tatoeba.org/epo/sentences/show/542833
    http://tatoeba.org/epo/sentences/show/542830
    http://tatoeba.org/epo/sentences/show/542828
    http://tatoeba.org/epo/sentences/show/542827
    http://tatoeba.org/epo/sentences/show/542825

    No:
    http://tatoeba.org/epo/sentences/show/542832

    Which means that nine out of then graphs with a French sentence have hidden translations which can not be seen in some languages...

    Do you understand now why I think there is a problem?

    ReplyDelete
  7. Yes, the problem will definitely be solved someday :) We're coding the new version largely to solve that specific problem ^^

    But I think one other reason why there is this problem (or why it's becoming "large") is because the link feature does not scale to the growth we've had. I mean, only trusted users and moderators can link, and even then, trusted users cannot easily link ANY sentence. So this is clearly not enough. We need to make this feature less restricted and more usable, and that's something we can start improving on the current version.

    If you browse Esperanto sentences that are not directly translated into English or into Japanese, you will notice that many have an indirect translation.
    => http://tatoeba.org/sentences/show_all_in/epo/eng/eng/indifferent
    => http://tatoeba.org/eng/sentences/show_all_in/epo/jpn/jpn/indifferent
    If more people were working on linking them, then it can make visible translations that are hidden...

    Perhaps next Tatoeba day will be about linking :)

    ReplyDelete
  8. I agree about making linking easier. It will be good to have an easier linking procedure and to have encouraging for linking.

    You may try to convince people to put more links. But to solve the problem of hidden sentences for Esperanto you need at least 47 000 links between Esperanto and English alone. (This is the number of Esperanto sentences not linked directly but via one translation to English, from http://tatoeba.org/epo/sentences/show_all_in/epo/eng/eng/indifferent/page:4708 .) This will take a lot of time - and this is only one language link. So probably programming will be quicker to solve the hidden translations problem.

    ReplyDelete
  9. How about making linking easier?

    ReplyDelete
  10. Here you go :)

    http://blog.tatoeba.org/2011/05/some-tips-for-those-who-want-to-link.html

    ReplyDelete
  11. Thank you very much!

    If really one day you won't have any idea about what to do, maybe indicating the number of links (per language and per user) would be nice... (But we can live without that, quite sure.)

    ReplyDelete

Note: Only a member of this blog may post a comment.