Friday, April 2, 2010

Japanese romanization in Tatoeba, now using MeCab

We used to display romaji in Tatoeba... We don't anymore. Well, at least not directly. We are now going to display the reading in hiragana. You can however get the romaji version by hovering your mouse over the hiragana, and wait for the little tooltip to appear.

Before:








After:










There has been some discussion about it (like here, here or here), and I think this solution will make everyone happy.

Now, of course, the output generated is not perfect. So if anyone out there is interested to improve the hiragana generated, then please let us know! As much as I agree that the reading is a vital information for Japanese learners, I will NOT have time to make it any better. I'd really, really like if someone could take on this tasks.

For your information, we were using KAKASI in order to convert Japanese text into romaji. We have now switched to MeCab. Our romaji/furigana converter is still based on KAKASI though.

4 comments:

  1. I am simply trying to find some program that I can cut and paste into MS word or other useful documents for email,

    There are many of these converters that work well to display a few sentences with furigana but they dont work outside this site,

    ReplyDelete
  2. Learn Katakana says: Awesome! Great solution to the kana/roumaji reading issue.

    I'll be actively using Tatoeba in the near future; when I'm learning Japanese again. Thanks for your work! :D

    ReplyDelete
  3. The following command:
    mecab -Owakati foo.txt |mecab -Oyomi

    will segment the text and then convert the kanji and all hiragana to katakana. The final text is completely in katakana.

    How do you converted the kanji into hiragana and leave the hiragana alone?

    Thanks

    ReplyDelete
  4. Hello,

    Nice work. I'm attempting to use Mecab to convert kanji, hiragana, katakana to romaji. However, it doesn't seem to return romaji.

    I'll be honest, I'm super new to doing this. I just started researching it yesterday, came across MeCab, and implemented it in C#.

    It seems to "work" but doesn't output romaji. I posted a question about it on Stackoverflow: http://stackoverflow.com/questions/23657885/how-to-get-nmecab-to-output-romaji

    For example, when trying to convert "ども", Instead of romaji, I get "ども助詞,接続助詞,,,,,ども,ドモ,ドモ EOS".

    I'd appreciate any insight you can provide.

    Thanks for your time.

    ReplyDelete

Note: Only a member of this blog may post a comment.