Tatoeba Project

Wednesday, December 17, 2014

Tatoeba update (December 17th, 2014)

Small update
  • We fixed the problem of the languages not being displayed on the translate page, in the list for random sentences.
  • We fixed an issue where a sentence was not part of the search results, even though it had been indexed previously. This happened when the sentence was recently translated, or for which the owner or correctness has changed.

UI translations

I'd like to mention that we now have the website interface translated 100% in 7 languages: Arabic, Esperanto, Finnish, French, German, Italian and Russian.
We have as well Marathi (97%), Japanese (92%) and Polish (90%) not far from being completed.

Sentences deduplication

We're delaying once more the sentences deduplication. There are still some details we'd like to fix. Even though they are not critical and the deduplication itself is working properly (as far as we know), it's better to fix them sooner than later.

When everything is fixed, there will be another round of deduplication on the dev website. We will leave a few days again for everyone to check that there's indeed no major issue. Then we will run the script finally on the real website.

Thank you for your patience.

Saturday, December 6, 2014

Tatoeba update (December 6th, 2014)

Bug fix for login redirection

The problem with the redirection to a random sentence's page when logging in is now fixed. When you log in you will now stay on the same page you were at.

UI translations

Gillux implement various improvements for the UI translations. You can read more details about it in his post on the Wall.

Sentences deduplication

We're not forgetting about the deduplication script. We have still a few issues to fix before we can run it. There are no big issue though, so we can probably start deduplicating sentences next weekend.

Friday, November 28, 2014

Tatoeba update (November 29th, 2014)


Tatoeba will be under maintenance for approximately at least 3 hours.
Scheduled time: 04:00 to 07:00 11:00 UTC.

As mentioned in the previous blog post, we need to shut down Tatoeba for a few hours in order to do some changes in the database. These changes are needed for us to run later on the sentences deduplication script.

While Tatoeba is down, if you feel like translating, we always need people to help us translate the website interface.

Edit: The maintenance is over now. It took more time than planned due to MySQL logs still being activated during the changes, but everything went okay.

More frequent indexation

We do not have a lot of new things for this update, but there is nonetheless one piece of good news. Thanks to some optimization that gillux did a couple of weeks ago, we can afford to index new sentences more often. The interval was previously set to 1 hour, and we reduced it to 15 minutes. In other words, you will never have to wait more than 15 minutes to be able to find your sentences via the search function.

Donations and thanks

We recently received a small donation from Stanislav. So thank you, Stanislav! And thanks again to everyone before him who donated to make sure that Tatoeba will be hosted on a stable and fast server for the next few years :)

Sunday, November 23, 2014

Tatoeba update (November 23rd, 2014)

  • Regarding the link feature for advanced contributors: it is now possible to drag-and-drop the icons (instead of the sentence text) into the link icon in the menu.
  • Our assets files (images, CSS, javascript) now have a timestamp, so that the browser knows whether or not it needs to update them. This means you should no more have to worry about clearing your browser's cache.

Development website

Gillux recently set up a development (dev) website. The purpose of the dev website is to let members test new features and check the interface translations BEFORE they get released into the production (prod) website, that is the actual Tatoeba website.


We are planning to disable Tatoeba temporarily next weekend (November 29) for maintenance.
The maintenance is about changing the engine of our MySQL database from MyISAM to InnoDB. For this operation we need to stop access to the database, that's why we need to shut down Tatoeba. It should take around 3 hours.
We need to do this change in order to run the sentences deduplication script. More about this below.

Sentences deduplication

First of all, note that the deduplication script will not be running during the maintenance, but after. The script can run with Tatoeba being available. It is still unsure whether we will run the script next weekend or later. We are still in the phase of debugging the script.

There was a first test of the script of the dev website. It took 9.5 hours to complete. You can help us make sure that the script works well by checking the dev website. Duplicates that were removed can be identified as they were deleted by Horus (it's the current name of the deduplication bot).
If you notice any issue such as sentences that were deleted while they shouldn't have, or information that was not re-linked properly, report the problem to us on the Wall of the real website (not on the dev please) or on our Google group.

Sunday, November 16, 2014

Tatoeba update (November 16th, 2014)

Link to any sentence

This new feature affects only advanced contributors and corpus maintainers. It is now possible to link a sentence to any other sentence, and not just to its indirect translations. You will find an additional icon in the sentence's menu, which looks the same as the "link" icon next to the translations. Clicking on the button opens a textinput where you can indicate the target sentence.
You can enter either the sentence number or copy-paste the sentence URL.
You can also drag-and-drop a sentence's URL into the icon.

Linking and unlinking refreshes all the translations

There were some inconsistencies with the list of indirect translations displayed after linking or unlinking a sentence. This is now fixed. Whenever you link or unlink, you will see the correct list of indirect translations without having to refresh the page.

Contributions logs

The logs design have been reviewed to take into account the various feedbacks. If you do not see any change, try to empty your cache and/or refresh again.
Note that the date is now clickable and will redirect you to the sentence's page. The sentence will be left as a text so that people can copy-paste it - or part of it - more easily.

We won't implement any option to choose between the new and old design, but for those who are very attached to the old design, here's some CSS code that you can use with the Stylish extension.
Our member CK also has a page about using Stylish with Tatoeba, with some code snippet that you can reuse.
I encourage you to learn about CSS and customize the looks to your own taste, not only for the logs but for any other part of Tatoeba. And if you do come up with something that looks a lot better, don't hesitate to share with the rest of the community!

Search fix for sentences translated into the same language

If you ever tried to search from and into the same language (for instance search "fish" from English to English), you may have noticed that the results includes many sentences that do not have any translation - if you wonder, yes, it's possible in Tatoeba to have two sentences of the same language linked to each other.
This kind of search now only returns sentences that do have translations. So searching "fish" from English to English will only return sentences that have at least one direct or indirect translation in English.

(Edit: forgot to mention one thing)
Fixed message not submitted after changing UI language

This update also fixes an annoying bug that prevented people to send comments, wall posts, translations, private messages etc. whenever the interface language was changed from a different place than the page you were submitting from. The symptom was a never ending loading icon that replaced the text you wanted to submit, while nothing was actually submitted.

Tuesday, November 11, 2014

Tatoeba update (November 11th, 2014)

Contributions logs
  • The contributions logs have been redesigned. 
  • There is a small additional visual feature: log entries that are obsolete are displayed a bit differently (with a dotted line and grey text), to indicate that there was more modification on the sentence afterwards.
  • The latest contributions page now also includes the list of users who participated in the latest contributions. It is the same list that you would find in the Members page.

New platform for UI translations

We moved to a platform called Transifex to manage our interface translations. Hopefully this will help us build a more cohesive translators team.
For those who were previously translating on Launchpad: we do not use Launchpad anymore. Don't worry, the translations that were made in Launchpad were exported to transifex, so no translation was lost.

If you would like to join the translators team, simply go to this page, click on "Help translate Tatoeba website", create an account, log in and apply to the language(s) in which you'd like to translate. If the language is not listed, you can request it to be added. Once your application is validated, you will be able to submit translations.

Sunday, October 19, 2014

Tatoeba update (October 19, 2014)

Search results sorted by sentence length

Shorter sentences will have higher priority over longer ones in the search results. Even though the length of a sentence does not necessarily imply that it's a better example sentence, this should make the results more relevant overall.

Possibility to comment deleted sentences

The comment form was displayed on deleted sentences, but the comment was not saved after submission. This has been fixed and it is now possible to post comments on deleted sentences.

Script to remove duplicate sentences

This is just a little note that there has been good progress on the deduplication script. We'll hopefully be able to clean up the corpus soon :)

Other fixes
  • Fixed truncation of long URL's containing non Latin characters.
  • Long words or links that exceed their container box are now split into a new lines instead.
  • Fixed a bug where a part of an URL would be converted into a sentence's link.
  • Fixed a bug where some Wall message previews were displayed as empty on the homepage.