Thursday, April 1, 2010

Audio for Tatoeba sentences, in partnership with Shtooka


We started to add audio in Tatoeba, and it will be available on April 3rd. Great, isn't it? :D

Yes, but (there is a but) you will probably be disappointed to see that most of the sentences will be indicating "audio unavailable". So far, only a few hundred sentences have audio, which is barely 0.1% of the whole corpus. This however not a fatality! If you are interested in helping us adding more audio, keep reading.


First of all, about Shtooka

Shtooka is a small non-profit orgnization based in Paris which goal is to gather collections of audio for words, expressions, proverbs, sentences, etc. You can browse their collections here.

We have met them at an event they organized on February 13th, and thanks to them, we are now starting to integrate audio into Tatoeba.


Audio for Shanghainese

The audio we have so far in Shanghainese. Yes, we do have such an exotic language. Now, you may be wondering why on Earth did we pick Shanghainese? Well, for a few reasons.
  • Allan (aka. sysko), one of the most active developer in the team, is very interested in Chinese, and more particularly in Shanghainese. He was provided 900 Shanghainese sentences from shanghaining.com.
  • Congcong (aka. fucongcong), one of the most important contributor in Tatoeba, speaks Shanghainese.
  • They were both able to meet regularly Nicolas (aka. zmoo), president of Shtooka, in order to record these sentences in Paris.

Want more?

Needless to say, we will be very happy to add audio for any other language. But it's not going to be easy, and it's not going to be possible without your help! So if you are interested...
  1. First of all, send us an email at team@tatoeba.fr, with the title "Audio for Tatoeba in [insert-language-here]".
  2. You have to know that Shtooka insists a lot on quality, therefore recording from your laptop's microphone is not an option. We will explain things more in details when we contact you back.
  3. Then if you are still motivated, start gathering sentences for which you would like to record audio, by creating lists. Limit each list to 100 sentences max.
  4. Note that you can also create lists just to gather sentences for which you want audio, even if you are not going to record them. Just make sure that all the sentences in a list are in a same language.
Anyway, having audio in Tatoeba is really exciting for us, and we hope that many of you will join us in this quest!

5 comments:

  1. Chinese audio is actually quite articulate.

    ReplyDelete
  2. but getting audio for Chinese sentences is critical. If anyone is willing to record the 20,000 most common Chinese sentences (see HSK list) it would be super helpful.

    ReplyDelete
  3. how do i record on a mac? it says the recording software is only windows compatible. I also speak shanghainese and would like to contribute to the project.

    ReplyDelete
  4. How can I obtain a list of, or search within, phrases in a particular language with audio?

    On the audio page I see a list on the right of languages, clicking on them gives a link to what I'm talking about. For instance:

    http://tatoeba.org/eng/sentences/with_audio/wuu

    However, it is not clear if this list is comprehensive (meaning that as of yet you only have 8 languages with any audio). I can try specific languages, like Japanese here:

    http://tatoeba.org/eng/sentences/with_audio/jpn

    And as there are no results, I believe this means that Tatoeba has yet to add any Japanese audio. Is this a correct reading? Are there any languages outside of those 8 that have audio recordings? I would be nice if the answer to this question was immediately clear so people don't spend their time searching around for something that doesn't exist.

    ReplyDelete
  5. This site (http://shtooka.net/) is unavailable :/

    ReplyDelete

Note: Only a member of this blog may post a comment.