Sunday, October 13, 2019

About the Kabyle language icon

Two weeks ago we decided that our Kabyle language icon should be changed. The new language icon is now live and is based on the Berber flag.

This change was probably the most contested decision we ever had to do. It took us more than a year to actually decide what to do. You can trace back all the details starting from this Wall thread:

To summarize:
  • We received many complaints telling us that the flag that we used to craft the icon is a political flag and is not at all a flag that represents the Kabyle people. 
  • However, the opposition to change the flag was also very strong, with the active Kabyle contributors telling us that people who try to remove this "Kabyle flag" have a hidden agenda to eradicate the Kabyle culture and language.
There was an unprecedented level of polarity among the Kabyle native speakers community.

What helped us come to a decision was:
  • Finding out that the flag is a very recent flag, adopted only in March 2015.
  • Finding out that the Kabyle people, regardless of their political opinions, already acknowledge another flag as a symbol of identity: the Berber flag.
We will see in 10, 15, 20 years how the opinion about the newer Kabyle flag has evolved and we can reconsider using this flag when it is not-so-new anymore. Until then, we hope that our choice to use the Berber flag as a graphical representation of the Kabyle language will bring more peace and collaboration among the Kabyle native speakers in Tatoeba.

Thursday, August 22, 2019

A second MOSS award

Great news everyone! We have been granted another award from the Mozilla Open Source Support program (MOSS).

A second award, you say. What was the first one about?

You can read about what we did with the first award from this blog post.

What about this second award?

This time we are receiving $15,000 to help us improve the website's interface. More specifically, this award will enable us to (finally) complete the Responsive UI project that was initiated a couple of years ago.

"Responsive UI" means that the content of the website will adapt to the size of the screen. The end goal is to make Tatoeba easier to use from a mobile device.

Changes will be introduced gradually, over 8-9 months, so you won't be just waking up one day and notice a completely new Tatoeba. Instead, every week you will see a little something new.

The look and feel of the website will slowly become more consistent throughout all pages.
The new sentence design will slowly get features from the old design, until everything is migrated and we can finally fully switch to the new sentence design.
The layout and content of each page will be reviewed, steering away from the two-column layout and towards a single-column layout.

Week after week, we will put the pieces together so that we can, after 8-9 months, we can make the final switch: make the UI responsive.

Getting involved

This won't be an easy project and we will need all the help we can get.

You are therefore all invited to get involved to ensure that we achieve a new interface that is more pleasant to use on mobile devices, while also being more pleasant to use on desktop.

If you are interested to help with UI/UX design, contact me at I also encourage you to read the discussion we had on the Wall not so long ago about user experience.

If you are interested to help with the code, have a look at our guide on how to contribute as a developer.

There is always plenty of things to do and you can help no matter your level of skills!

Saturday, August 10, 2019

Decision making and governance in Tatoeba

I would like to take you behind the scenes of the Tom and Mary discussion and share with you my story on this controversial topic and what it actually took to come up with a decision.

The most important message I hope you’ll take from this is that anyone has the power to make a change in Tatoeba. If something bothers you and you think we should take measures for or against it, there’s a way to have an influence, beyond posting your thoughts on the Wall. I’ll explain to you how, through this story.

First, let me set up the context.

We have a lot of sentences with “Tom” and “Mary” in our corpus. This was not the result of any official decision. This was the result of our most prolific contributor. His goal was to avoid redundancy. If everyone could use the same names in their sentences, it’s less likely to have many sentences that are similar in patterns. For instance, you don’t want everyone to be adding sentences with “My name is”: “My name is Trang”, “My name is Tom”, “My name is Mary”, “My name is Bond” (“... James-Tom-Mary Bond”).

So the chosen names were Tom and Mary, but those names were chosen arbitrarily (to my knowledge). After some time, the idea started being contested. For good reasons. Among other things, Tom and Mary were quite inconvenient names for languages where names are subject to declensions. People translating from English into such language felt that Tom and Mary sentences were not allowing them to express all the linguistic properties of their language.

This didn’t stop Tom and Mary from proliferating. I had no particular plan to make an intervention on this topic but it became a recurring topic. Every now and then, I would see people posting about it on the Wall, but I would still choose to let the community handle it on their own.

Lately, I’ve been cleaning up issues on GitHub. My main goal is to clean up as much as possible our issues before starting some small campaign to find new developers. We now have a much easier way to set up Tatoeba locally and I would like to put it under test.

So as I was cleaning up GitHub issues last weekend, I stumbled upon one issue that was pointing to a post in a thread that was proposing a solution to diversify names. It reminded me about another thread, that was posted more recently and that I only briefly saw, on the topic of using a small set of names. I decided to actually read it entirely.

And at that point, I just felt annoyed. I felt annoyed that after all this time, there was still no closure on this topic. I wanted to carry on cleaning up the GitHub issues, but I felt annoyed. Kind of like when you’re trying to focus on something, or worse, when you’re trying to sleep, and there’s this freaking mosquito/fly bzzz’ing around your head. That’s how annoyed I was.

This is a topic that already started a couple of years ago, and I understand that some topics are not easy. But by now, it was pretty clear to me that the wildcard idea was not a good idea. People gave good arguments against it. Still, it didn’t seem to be clear for everyone.

So I asked myself, what can I do? What can I do to put a stop to this madness?

Technically, I can do a lot of things. I can change the source code to reject every sentence containing “Tom” or “Mary”. Or I can automatically convert “Tom” or “Mary” into a random name upon saving. Or I could block everyone from contributing until I receive a private message saying “I understand I don’t have to use Tom or Mary in my sentences and I will not try to influence other people to do so”. Obviously, that’s not my type of approach.

I decided I would initiate a discussion with a radical question: should we stop Tom and Mary sentences? I already had the answer to “Should we enforce Tom and Mary sentences?”, but what about the opposite? I had no clear answer to that. But I gave it a try.

I tried to gather my best arguments on why we should stop Tom and Mary’s expansion, I drafted a proposal on what to do to stop Tom and Mary sentences, and I posted it on the Wall. I was quite convinced by my arguments and I really felt it would be a good thing to do. But knowing that this whole idea was way too recent to be a final decision, I made sure to not close my mind on it: to leave a door open for opposite points of view and not be carried away by my newfound convictions.

Now comes the tough part. Because while it may look like I’m trying to convince the world that we have to stop Tom and Mary, in my own head, I’m still battling against it. If all I had to provide was just a “yes” or a “no”, I could just flip a coin; it’s much less trouble. But the decision itself doesn’t matter. The “yes” or the “no” doesn’t matter. The reasons behind it, is what matters. I needed to find the strongest reasons, reasons that no one in their right mind could argue against.

Eventually, I found these reasons and they led me to decide against my own proposal. I did not find them directly from the conversation, that is, no one really said anything that was so well-argued and made so much sense that I changed my mind. But the fact that there was opposition to my proposal was a sign that I had to look further, to look deeper into the core values of Tatoeba. Even knowing that I can never make everyone happy, I still had to look.

In all of this, the real goal was not to find out what is the “best decision”. The real goal was to find common ground. And the common ground here was that diversity and redundancy are not exclusive from each other. Diversity can and will still grow in a corpus that looks very repetitive. It will take a lot of time, but patience is our most important virtue.

It all makes sense now but it wasn’t easy, I can tell you. It took several days of intense reflection, which probably totaled to 40 hours, if not more. 40 hours for thinking, for replying to people on the Wall and for formalizing the official decision. It took as much time as a full-time job and it heavily felt like one.

I will not do this again anytime soon, and hopefully, you understand why.

But that conversation we had on Tom and Mary was an important one. And there are many, many other important conversations that we need to have and will need to have. Who will be leading them, if not me?

Well, anyone else.

Find a problem, make a proposal on the Wall, discuss with the community, challenge everyone’s opinions (even your own), gather enough information and points of view until you can make a decision backed up with a rationale that no one can contest, then formalize the decision in a document.

Up to this point, it could be done by any member of Tatoeba. It doesn’t have to come from me. Anyone who has access to the Wall and access to a brain can do this.

The only steps where I would need to step in is to make it official and to help enforce the decision. You send me the document, I read it, it makes sense, I publish it on the blog, it becomes official and we follow through.

This is it.

It takes time, it takes skills, it takes a lot of empathy. I won’t deny it’s difficult, my story shows it. But all of that you can learn and practice. Please, don’t be afraid to give it a try. As long as you know the process, which I just described, you too can become a decision maker.

My dream is to see, one day, proposals being formalized without my intervention at all, and all I have to do (aside from being convinced) is to make them official. In the much longer term, my dream is to be able to transfer my authority to other people and let them officialize decisions instead of me, to no longer be the one and only almighty governor of Tatoeba.

I know it will take time but with the case of Tom and Mary, I hope I showed you the way, at least one way. I hope you spread the word. I hope you spread the love too. I think I did my part. The rest is up to you.


Should we stop sentences with Tom and Mary?


A discussion was initiated on the Wall regarding the overwhelming amount of sentences containing “Tom” and “Mary”. The initial proposal was to ask the community to stop creating new sentences with “Tom” and “Mary”.

Wall thread:


To the question “Should we stop sentences with Tom and Mary?”, the official decision is: no, we should not. Contributors may continue creating sentences with “Tom” and “Mary”. No action will be taken against them.

As a general rule, no action will be taken against a contributor based on the sole fact that they are creating new sentences with a name that has been overused.

We will still take measures in regards of the underlying issue of diversity in our corpus:
  • We will make it clear in our documentation that people are free to use other names than Tom and Mary.
  • We will add on Tatoeba's contribution page a short text to encourage people to keep the corpus diverse.
  • We will create guidelines on how to contribute diverse sentences. These guidelines will be published in the wiki.


We recognize that the content of our corpus has become unfulfilling for many of our users and we recognize that we need to make an effort to make it more diverse. However, after a thorough discussion with the community, I can conclude that attempting to make Tom and Mary "illegal" is not an adequate response to the problem.

In the same way that it has been said:
Whatever issues or inconveniences arise because of people using a more diverse set of names, we will solve them, but with another solution than enforcing wildcard names.
The same thing could be said from the other side:
Whatever issues or inconveniences arise because of people overusing "Tom" and "Mary", we will solve them, but with another solution than restricting these names.
Some of our contributors feel a certain attachment to Tom and Mary. It has become a comfort zone for them and if they do not wish to step out of their comfort zone, we should not force them to. Doing so would only generate a sense of loss of freedom and under these conditions, it is easy to develop uncooperative behavior or even try to cause more problems as a sign of protest.

We can obviously have the same issue on the other side: people might be leaving or causing problems out of disappointment that Tom and Mary sentences will continue to expand. But then we are just trading a bad situation against another bad situation and there is no way to evaluate which one really is worse than the other.

Restrictive measures may help us achieve our diversity goal faster, but such measures would be motivated by impatience. As long as Tatoeba welcomes people from all backgrounds and gives them the chance to express themselves in their most authentic ways, we will achieve this goal. The abundance and growth of Tom and Mary sentences does not eliminate the possibility for a diverse corpus. We will get there, that is inevitable. Whether it takes five years or fifty years, there is no rush.

Additional points

Tom and Mary, aka. wildcards, started out as an idea to reduce redundancy in the corpus. It has been demonstrated that this idea is inefficient. If you have been creating sentences with wildcards under the belief that it helps to prevent near-duplicate sentences, know that it can actually have the opposite effect. You may continue to create sentences with wildcards if you wish to, but you cannot claim that it is for the sake of reducing redundancy. It is misinformation at this point, and it is spreading an unnecessary fear of near-duplicates.

Near-duplicates are not a big deal. We need to make this clear. They are in fact necessary. They help to identify patterns. We encourage everyone to simply not worry about them and focus on being creative instead. Avoiding near-duplicates will come naturally: the more creative your sentence is, the less likely there will be a near-duplicate of it. This will also help with diversity.

Sunday, January 6, 2019

New Year, New Tatoeba

Happy New Year everyone :)

In a couple of weeks we will be releasing a new version of Tatoeba! The deployment is currently scheduled on January 19th. On the surface, you won’t be noticing any difference. Same look, same features (kind of). But there will be actually some major changes.

We’re handling a new license: Creative Commons Zero

It will be possible for Tatoeba contributors to choose between Creative Commons Attribution (CC-BY) and Creative Commons Zero (CC0) when submitting new sentences.

The difference between CC-BY and CC0:
  • With CC-BY anyone can reuse the data for any purpose, but is required to mention where they got the data from.
  • With CC0 there is no requirement at all, no need to say where the data comes from.
As a contributor, if you do not wish to use CC0, you will not have to. You can continue contributing as you used to, nothing will change for you. Your sentences will keep being released under CC-BY.

If you however want your contributions to be reused in other projects without any strings attached, then you’ll have the possibility to contribute new sentences under CC0, as well as switch the license of your existing sentences to CC0 under some conditions.

All of this will be detailed further once we deploy the new Tatoeba.

We’re migrating to CakePHP 3

Tatoeba is built on top of a framework called CakePHP. We’ve always been lagging behind, using much older versions than the latest available. The current website is still based on version 2.9, while version 3 was released almost four years ago. But we’ve finally been able to catch up and migrated our code to work on CakePHP 3.6.

There are still a few features to migrate, but we should be ready to deploy in two weeks!

For our non-tech-savvy users, this migration will perhaps feel like we went backwards. There will be nothing new, but there may be some features broken and there may be some features working slower than they used to. We will be fixing all of that within the following weeks, so please bear with us.

This migration was an important task for the longer term, for the same reasons than when we migrated from CakePHP 1 to CakePHP 2 a couple of years ago: there are various technical benefits and Tatoeba can now hopefully look more attractive for the developers out there who want to contribute to an open source project.

If you are one of these developers, we will be more than happy to welcome you onboard. Don’t be afraid to contact us.

We’re growing as an organisation

Looking back at when we had our “big crash” in 2017 and people were a bit worried about the state of Tatoeba, and looking at where we are now, Tatoeba has made a big step forward as an organisation.

Back then, Tatoeba was funded only with donations. These donations helped us paying for the server but we never made big campaigns and could not do much more with our money. Hiring staff was completely out of reach.

Thanks to Mozilla Open Source Support (MOSS), this has changed. We heard of the MOSS program after Mozilla Common Voice approached us to explore ways of collaboration. We applied for it and got accepted. We were awarded $25,000 and were able to hire our first employee.

This made a huge difference for us. Not only the integration of the CC0 license and the migration to CakePHP 3 were possible thanks to this award, but we were also able to fix many bugs and implement many improvements.

We will undoubtedly apply for MOSS again, but we will also look into other ways to get fundings. The next big goal would be to find a sustainable flow of income for the decades to come.

2018 was a pretty good year for us. Let's hope the trend continues in 2019 :)