Thursday, August 22, 2019

A second MOSS award

Great news everyone! We have been granted another award from the Mozilla Open Source Support program (MOSS).

A second award, you say. What was the first one about?

You can read about what we did with the first award from this blog post.

What about this second award?

This time we are receiving $15,000 to help us improve the website's interface. More specifically, this award will enable us to (finally) complete the Responsive UI project that was initiated a couple of years ago.

"Responsive UI" means that the content of the website will adapt to the size of the screen. The end goal is to make Tatoeba easier to use from a mobile device.

Changes will be introduced gradually, over 8-9 months, so you won't be just waking up one day and notice a completely new Tatoeba. Instead, every week you will see a little something new.

The look and feel of the website will slowly become more consistent throughout all pages.
The new sentence design will slowly get features from the old design, until everything is migrated and we can finally fully switch to the new sentence design.
The layout and content of each page will be reviewed, steering away from the two-column layout and towards a single-column layout.

Week after week, we will put the pieces together so that we can, after 8-9 months, we can make the final switch: make the UI responsive.

Getting involved

This won't be an easy project and we will need all the help we can get.

You are therefore all invited to get involved to ensure that we achieve a new interface that is more pleasant to use on mobile devices, while also being more pleasant to use on desktop.

If you are interested to help with UI/UX design, contact me at I also encourage you to read the discussion we had on the Wall not so long ago about user experience.

If you are interested to help with the code, have a look at our guide on how to contribute as a developer.

There is always plenty of things to do and you can help no matter your level of skills!

Saturday, August 10, 2019

Decision making and governance in Tatoeba

I would like to take you behind the scenes of the Tom and Mary discussion and share with you my story on this controversial topic and what it actually took to come up with a decision.

The most important message I hope you’ll take from this is that anyone has the power to make a change in Tatoeba. If something bothers you and you think we should take measures for or against it, there’s a way to have an influence, beyond posting your thoughts on the Wall. I’ll explain to you how, through this story.

First, let me set up the context.

We have a lot of sentences with “Tom” and “Mary” in our corpus. This was not the result of any official decision. This was the result of our most prolific contributor. His goal was to avoid redundancy. If everyone could use the same names in their sentences, it’s less likely to have many sentences that are similar in patterns. For instance, you don’t want everyone to be adding sentences with “My name is”: “My name is Trang”, “My name is Tom”, “My name is Mary”, “My name is Bond” (“... James-Tom-Mary Bond”).

So the chosen names were Tom and Mary, but those names were chosen arbitrarily (to my knowledge). After some time, the idea started being contested. For good reasons. Among other things, Tom and Mary were quite inconvenient names for languages where names are subject to declensions. People translating from English into such language felt that Tom and Mary sentences were not allowing them to express all the linguistic properties of their language.

This didn’t stop Tom and Mary from proliferating. I had no particular plan to make an intervention on this topic but it became a recurring topic. Every now and then, I would see people posting about it on the Wall, but I would still choose to let the community handle it on their own.

Lately, I’ve been cleaning up issues on GitHub. My main goal is to clean up as much as possible our issues before starting some small campaign to find new developers. We now have a much easier way to set up Tatoeba locally and I would like to put it under test.

So as I was cleaning up GitHub issues last weekend, I stumbled upon one issue that was pointing to a post in a thread that was proposing a solution to diversify names. It reminded me about another thread, that was posted more recently and that I only briefly saw, on the topic of using a small set of names. I decided to actually read it entirely.

And at that point, I just felt annoyed. I felt annoyed that after all this time, there was still no closure on this topic. I wanted to carry on cleaning up the GitHub issues, but I felt annoyed. Kind of like when you’re trying to focus on something, or worse, when you’re trying to sleep, and there’s this freaking mosquito/fly bzzz’ing around your head. That’s how annoyed I was.

This is a topic that already started a couple of years ago, and I understand that some topics are not easy. But by now, it was pretty clear to me that the wildcard idea was not a good idea. People gave good arguments against it. Still, it didn’t seem to be clear for everyone.

So I asked myself, what can I do? What can I do to put a stop to this madness?

Technically, I can do a lot of things. I can change the source code to reject every sentence containing “Tom” or “Mary”. Or I can automatically convert “Tom” or “Mary” into a random name upon saving. Or I could block everyone from contributing until I receive a private message saying “I understand I don’t have to use Tom or Mary in my sentences and I will not try to influence other people to do so”. Obviously, that’s not my type of approach.

I decided I would initiate a discussion with a radical question: should we stop Tom and Mary sentences? I already had the answer to “Should we enforce Tom and Mary sentences?”, but what about the opposite? I had no clear answer to that. But I gave it a try.

I tried to gather my best arguments on why we should stop Tom and Mary’s expansion, I drafted a proposal on what to do to stop Tom and Mary sentences, and I posted it on the Wall. I was quite convinced by my arguments and I really felt it would be a good thing to do. But knowing that this whole idea was way too recent to be a final decision, I made sure to not close my mind on it: to leave a door open for opposite points of view and not be carried away by my newfound convictions.

Now comes the tough part. Because while it may look like I’m trying to convince the world that we have to stop Tom and Mary, in my own head, I’m still battling against it. If all I had to provide was just a “yes” or a “no”, I could just flip a coin; it’s much less trouble. But the decision itself doesn’t matter. The “yes” or the “no” doesn’t matter. The reasons behind it, is what matters. I needed to find the strongest reasons, reasons that no one in their right mind could argue against.

Eventually, I found these reasons and they led me to decide against my own proposal. I did not find them directly from the conversation, that is, no one really said anything that was so well-argued and made so much sense that I changed my mind. But the fact that there was opposition to my proposal was a sign that I had to look further, to look deeper into the core values of Tatoeba. Even knowing that I can never make everyone happy, I still had to look.

In all of this, the real goal was not to find out what is the “best decision”. The real goal was to find common ground. And the common ground here was that diversity and redundancy are not exclusive from each other. Diversity can and will still grow in a corpus that looks very repetitive. It will take a lot of time, but patience is our most important virtue.

It all makes sense now but it wasn’t easy, I can tell you. It took several days of intense reflection, which probably totaled to 40 hours, if not more. 40 hours for thinking, for replying to people on the Wall and for formalizing the official decision. It took as much time as a full-time job and it heavily felt like one.

I will not do this again anytime soon, and hopefully, you understand why.

But that conversation we had on Tom and Mary was an important one. And there are many, many other important conversations that we need to have and will need to have. Who will be leading them, if not me?

Well, anyone else.

Find a problem, make a proposal on the Wall, discuss with the community, challenge everyone’s opinions (even your own), gather enough information and points of view until you can make a decision backed up with a rationale that no one can contest, then formalize the decision in a document.

Up to this point, it could be done by any member of Tatoeba. It doesn’t have to come from me. Anyone who has access to the Wall and access to a brain can do this.

The only steps where I would need to step in is to make it official and to help enforce the decision. You send me the document, I read it, it makes sense, I publish it on the blog, it becomes official and we follow through.

This is it.

It takes time, it takes skills, it takes a lot of empathy. I won’t deny it’s difficult, my story shows it. But all of that you can learn and practice. Please, don’t be afraid to give it a try. As long as you know the process, which I just described, you too can become a decision maker.

My dream is to see, one day, proposals being formalized without my intervention at all, and all I have to do (aside from being convinced) is to make them official. In the much longer term, my dream is to be able to transfer my authority to other people and let them officialize decisions instead of me, to no longer be the one and only almighty governor of Tatoeba.

I know it will take time but with the case of Tom and Mary, I hope I showed you the way, at least one way. I hope you spread the word. I hope you spread the love too. I think I did my part. The rest is up to you.


Should we stop sentences with Tom and Mary?


A discussion was initiated on the Wall regarding the overwhelming amount of sentences containing “Tom” and “Mary”. The initial proposal was to ask the community to stop creating new sentences with “Tom” and “Mary”.

Wall thread:


To the question “Should we stop sentences with Tom and Mary?”, the official decision is: no, we should not. Contributors may continue creating sentences with “Tom” and “Mary”. No action will be taken against them.

As a general rule, no action will be taken against a contributor based on the sole fact that they are creating new sentences with a name that has been overused.

We will still take measures in regards of the underlying issue of diversity in our corpus:
  • We will make it clear in our documentation that people are free to use other names than Tom and Mary.
  • We will add on Tatoeba's contribution page a short text to encourage people to keep the corpus diverse.
  • We will create guidelines on how to contribute diverse sentences. These guidelines will be published in the wiki.


We recognize that the content of our corpus has become unfulfilling for many of our users and we recognize that we need to make an effort to make it more diverse. However, after a thorough discussion with the community, I can conclude that attempting to make Tom and Mary "illegal" is not an adequate response to the problem.

In the same way that it has been said:
Whatever issues or inconveniences arise because of people using a more diverse set of names, we will solve them, but with another solution than enforcing wildcard names.
The same thing could be said from the other side:
Whatever issues or inconveniences arise because of people overusing "Tom" and "Mary", we will solve them, but with another solution than restricting these names.
Some of our contributors feel a certain attachment to Tom and Mary. It has become a comfort zone for them and if they do not wish to step out of their comfort zone, we should not force them to. Doing so would only generate a sense of loss of freedom and under these conditions, it is easy to develop uncooperative behavior or even try to cause more problems as a sign of protest.

We can obviously have the same issue on the other side: people might be leaving or causing problems out of disappointment that Tom and Mary sentences will continue to expand. But then we are just trading a bad situation against another bad situation and there is no way to evaluate which one really is worse than the other.

Restrictive measures may help us achieve our diversity goal faster, but such measures would be motivated by impatience. As long as Tatoeba welcomes people from all backgrounds and gives them the chance to express themselves in their most authentic ways, we will achieve this goal. The abundance and growth of Tom and Mary sentences does not eliminate the possibility for a diverse corpus. We will get there, that is inevitable. Whether it takes five years or fifty years, there is no rush.

Additional points

Tom and Mary, aka. wildcards, started out as an idea to reduce redundancy in the corpus. It has been demonstrated that this idea is inefficient. If you have been creating sentences with wildcards under the belief that it helps to prevent near-duplicate sentences, know that it can actually have the opposite effect. You may continue to create sentences with wildcards if you wish to, but you cannot claim that it is for the sake of reducing redundancy. It is misinformation at this point, and it is spreading an unnecessary fear of near-duplicates.

Near-duplicates are not a big deal. We need to make this clear. They are in fact necessary. They help to identify patterns. We encourage everyone to simply not worry about them and focus on being creative instead. Avoiding near-duplicates will come naturally: the more creative your sentence is, the less likely there will be a near-duplicate of it. This will also help with diversity.