Wednesday, December 25, 2013

Tatoeba is back up, with some issues still

So we managed to bring back Tatoeba, and you have to thank mostly sysko and liori for this. Since the recovery process involved moving to a new server, we have of course some issues to deal with still. I will list here the issues that people have reported and that I have taken note of.

1. Language detection is not working. This is normal. It's one of the things sysko didn't have time to reinstall yet.

2. Apparently some people cannot add new sentences. Right now I have no idea why...

3. The furigana for Japanese sentences isn't displaying properly. It looks like some encoding issue, but I don't know how easy it will be to figure out how to fix it.

Thursday, December 19, 2013

Tatoeba down

If you were trying to browse Tatoeba but ended up here on the blog, this is not a mistake. Tatoeba is currently down and will remain unavailable for a few more days. We hope to have it back some time this weekend (21st-22nd Dec). Very sorry for the inconvenience!

UPDATE (23 Dec, 00:50 GMT+1)
We currently have a new server up and the necessary data was transferred to this new server. There shouldn't be much left to do to have Tatoeba back, but we cannot work 24/7 on this and we don't know if we may run into some more unexpected problems, so we cannot guarantee the website will be available on Monday. We will keep you up to date if it should take much longer than expected though. Sorry again for the inconvenience and thank you for the supportive comments!

UPDATE (25 Dec, 4:30 GMT+1)
So as expected we had some unexpected issues which delayed us from bringing Tatoeba back up. And as you know this is Christmas and New Year's time, which means we are busier than usual with families and friends. As much as I hate it, all I can say right now is that Tatoeba still won't be back today. We still wish you a Merry Christmas though :)

UPDATE (25 Dec 20:00 GMT+1)
Tatoeba is now back! I have changed the DNS only a few minutes ago so you will have to give it a bit of time before http://tatoeba.org will actually bring you to the website instead of the blog. You can however go to http://93.20.168.172 in the meantime.

Wednesday, July 24, 2013

Tatoeba update (July 24rd, 2013)

This is only a small update but I would like to mention here as well some useful information for people who would want to get more involved. But first, about the update.

Restriction on private messages for new users
Users who are registered for less than two weeks can now only send 5 private messages per day. This is meant to prevent spam.
Just in case you are wondering, we don't have a "report spam" feature for now, so if you happen to receive spam, the best way to report it is to send a private message to an admin.

A simpler way to refer to sentences in comments
When writing a comment on a sentence, if you would like to refer to another sentence, you can now use the syntax [#id], where id is the sentence id. It will automatically create a link to the corresponding sentence. For instance, instead of writing:
This sentence is similar to http://tatoeba.org/eng/sentences/show/1860660
You can write:
This sentence is similar to [#1860660]


And now for people who want to get more involved.

1. We're going to try to keep this tickets page as up-to-date as possible. It contains the various feature requests, bug reports, and other todo's that we would have to work on. If you would like to request or report something, you may want to check the page first to see if it hasn't been requested or reported already.

2. There is an article in the wiki for those who would like to help out with the development of Tatoeba. I wrote some months ago that we need people to help us maintain and improve Tatoeba, and we still do. We always will, actually. But back when I wrote the post, I guess most people didn't have a clear idea of what they could do to help. Hopefully the guide will help solve this issue.

Friday, May 31, 2013

Tatoeba update (May 31st, 2013)

After a long time without much update, we're finally starting to have some changes in the code. Hopefully we'll have even more updates in the next few month as people have been contacting us to let us know they would like to help in the development and maintenance of Tatoeba.

So what's new?

Inappropriate comments
The main change in this update is that admins will now have the possibility to hide comments that have been considered inappropriate. Such comments will only be displayed to the author and to the admins. Other people will only see a message informing them that the comment was hidden because it didn't comply to our rules. This may not be our definitive way of handling inappropriate comments, but we're at least going to give it a try and see how it works.

Downloads files
For those who use the files we provide on our downloads page, there was a change in the way the lists data is exported. There is more information now, and the exports was split into two files rather than just one. Cf. "Lists" and "Sentences in lists" on the downloads page.

Light display
We have made a light version of the sentence's page, in which only the sentence and its direct translations are displayed. Here's what sentence #1 looks like in the light version. This is useful for those who would like to include our content on their website, like it's done for example here.

Friday, May 17, 2013

The story of Tatoeba

Someone sent us an email to ask more about the story behind Tatoeba. It's true that there isn't so much information on that matter so I figured I would take some time to write about it.


It all started when I was traveling to Germany at the end of January 2006. At that time I was really fond of learning languages, and I was especially in love with Japanese. But there I was visiting a good friend of mine in Germany and couldn't speak German, so I was wondering if there was any good German-French or German-English dictionaries.

With Japanese, I had found what I considered back then as the most awesome dictionary of all times, http://www.alc.co.jp/. I loved it so much simply because it wasn't limited to words. I could search things like "hello" or "table", but I could also search expressions or partial sentences like "out of the blue" or "sometimes I think that". And it would return results. Some of the results are regular dictionary results, but the other results are actually sentences containing the searched word(s), and the translations of these sentences. That helped me a lot.

After searching good German-French/German-English dictionaries and not finding anything satisfying, I started to search such a dictionary for other pairs of languages. That led to me ask myself what would be my ideal dictionary and for some reason, I just couldn't stop thinking about it. So in the next following days I ended up writing everything I had in mind in a short document that I'm publishing here for the first time: Trang's ideal dictionary (just want to mention I was 19 years old at the time).

When I went back to France, I sent this document to several of my penpals. I tried to find people who would be interested to work on it with me, to either code it for me or to teach me what I would need to know to code it myself (mostly to code it for me because I didn't believe I would be able to do it myself). The best help I found was someone suggesting me to take a look at PHP. I had no idea what was PHP but I googled it, went to the PHP website, downloaded something. Then in the files and folders of PHP, I searched and clicked on any .exe file I would find, hoping that it would open a program in which I could type something, then click a button and make it display whatever I asked it to display... But nothing of what I expected happened so I gave up.

A couple of months later my little sister was trying to make a website, her online diary or something. She was following a tutorial about HTML, PHP and MySQL written for complete beginners. When I saw there was something about PHP I took a look at the tutorial as well and then things started to make a lot more sense. I spent a whole week experimenting, trying to make a small website with two pages. One page where I could save sentences and translations, and another search these sentences. I found out it wasn't that difficult, that I didn't need to find someone with years of experience in programming, that I could actually do it myself.

The very first version of Tatoeba wasn't called Tatoeba. I wouldn't even call it a first version, it was more of a prototype. It was hosted on Sourceforge under the codename multilangdict. I called it a "dictionary" but I knew it wasn't really going to be a dictionary. It was already clear to me that the focus of the project would be to collect sentences and their translations, since it was the type of data that I couldn't find easily.

As soon as I had a somewhat functional website, I asked all the people I knew could be interested to come and add or translate sentences. To my surprise some people reported to me they found it addictive to translate. I think the addiction came from the fact that a lot of the sentences that were added weren't the "textbook" kind of sentences. The database was empty, so we had to fill it with whatever we could. Most of the sentences were just a part of the life of whoever added them. It could be something they said, something they heard, something they thought of. That gave them a touch of authenticity, I guess, and made them more interesting.

A year later, during summer 2007, a new version... or rather the first version of the project was coded (the previous version was rather an experiment). It was around that time that I decided to call the project "Tatoeba". I chose this name because the goal of the project was to give example sentences, and "tatoeba" means "for example" in Japanese.

After the first version was coded, I imported the sentences from the Tanaka Corpus into the project, a collection around 150,000 pairs of Japanese-English sentences. The database then grew from ~5000 sentences to ~300,000 sentences. The community was still non-existent but at least there was more data that could potentially attract more people. Although one of my friends confessed to me that she liked the project much more before there were all these boring sentences from this corpus.

I released the second version of Tatoeba (which is the one in use at the moment) in December 2008. Sysko joined me in the project during summer 2009 and helped me a lot. I was very happy about this because up until that point I was pretty much "alone" on the project. I had people helping me punctually, but no one who would really get involved as much as Sysko did. And he has really done sooooo much for the project.

Anyway it's been a long way and a lot of things happened, but today we're at 2.3 million sentences and growing, with thousands of people using it everyday. The main problem now is quality (which is a topic that can bring, and has brought, very heated debates). But that's something I will talk about in another post.
This post is just for those who were wondering about how Tatoeba started :)

Wednesday, April 10, 2013

Looking for people to help us maintain and improve the current version


Well, it's been a while since I've posted something here. You'll have to excuse me but I'm gonna skip the part where I trace back all the things that happened in the past two years. That will be for another day. Right now, we need to get some things done.

For those who don't know, a few years back we decided that Tatoeba needed a new version, and we started working on a new version. And by "we", I mean mostly Sysko. He's still working on it because it's not something easy, and he doesn't have so much time, and I will not talk about it here because he can tell more about this than me. But, until we finally release the new version, we will need people to help us maintain and improve the current Tatoeba.

There are certain things that we will simply not be able to do in the current version due to technical limitations, and that is why we started a new version, but there are still many things we can do to improve the user experience, there are still many things we can do to improve and optimize the current code. And we need people for this.

Ideally, we would prefer people who can (and want to) stay with us in the long term (and I mean several years... at least), and become really part of the team. But of course, if you would like to give only a ponctual help to this project, that's okay too. We're not going to refuse any kind of help :)

It doesn't matter if you don't know much about programming, or more specifically, Web development. What matters more is that you have a deep interest in the project and feel passionate enough about it to learn whatever is necessary to make it better.
When I decided to start Tatoeba 7 years ago, I knew near to nothing about programming (seriously). All I knew is that I wanted something like Tatoeba, and I wasn't able to find it. So I figured I would just make it myself, whether it would take 5, 10 or 50 years. So this kind of mindset is much more important to me than any knowledge that you may or may not have.

If you do have some knowledge about Web development and would like to know more about the technical aspects of the current Tatoeba, you can check out this page. It will guide you on how to install Tatoeba on your own machine and you can explore the code. To be honest our documentation is still in a quite miserable state at the present time... but we're working on it.

Last but not least, if you know people who could be interested in helping us with the code, please let them know we're looking for people to join our dev team!

Thanks for reading, and I hope to hear from some of you!