Thursday, August 22, 2019

A second MOSS award

Great news everyone! We have been granted another award from the Mozilla Open Source Support program (MOSS).

A second award, you say. What was the first one about?

You can read about what we did with the first award from this blog post.

What about this second award?

This time we are receiving $15,000 to help us improve the website's interface. More specifically, this award will enable us to (finally) complete the Responsive UI project that was initiated a couple of years ago.

"Responsive UI" means that the content of the website will adapt to the size of the screen. The end goal is to make Tatoeba easier to use from a mobile device.

Changes will be introduced gradually, over 8-9 months, so you won't be just waking up one day and notice a completely new Tatoeba. Instead, every week you will see a little something new.

The look and feel of the website will slowly become more consistent throughout all pages.
The new sentence design will slowly get features from the old design, until everything is migrated and we can finally fully switch to the new sentence design.
The layout and content of each page will be reviewed, steering away from the two-column layout and towards a single-column layout.

Week after week, we will put the pieces together so that we can, after 8-9 months, we can make the final switch: make the UI responsive.

Getting involved

This won't be an easy project and we will need all the help we can get.

You are therefore all invited to get involved to ensure that we achieve a new interface that is more pleasant to use on mobile devices, while also being more pleasant to use on desktop.

If you are interested to help with UI/UX design, contact me at trang@tatoeba.org. I also encourage you to read the discussion we had on the Wall not so long ago about user experience.

If you are interested to help with the code, have a look at our guide on how to contribute as a developer.

There is always plenty of things to do and you can help no matter your level of skills!

Saturday, August 10, 2019

Decision making and governance in Tatoeba

I would like to take you behind the scenes of the Tom and Mary discussion and share with you my story on this controversial topic and what it actually took to come up with a decision.

The most important message I hope you’ll take from this is that anyone has the power to make a change in Tatoeba. If something bothers you and you think we should take measures for or against it, there’s a way to have an influence, beyond posting your thoughts on the Wall. I’ll explain to you how, through this story.

First, let me set up the context.

We have a lot of sentences with “Tom” and “Mary” in our corpus. This was not the result of any official decision. This was the result of our most prolific contributor. His goal was to avoid redundancy. If everyone could use the same names in their sentences, it’s less likely to have many sentences that are similar in patterns. For instance, you don’t want everyone to be adding sentences with “My name is”: “My name is Trang”, “My name is Tom”, “My name is Mary”, “My name is Bond” (“... James-Tom-Mary Bond”).

So the chosen names were Tom and Mary, but those names were chosen arbitrarily (to my knowledge). After some time, the idea started being contested. For good reasons. Among other things, Tom and Mary were quite inconvenient names for languages where names are subject to declensions. People translating from English into such language felt that Tom and Mary sentences were not allowing them to express all the linguistic properties of their language.

This didn’t stop Tom and Mary from proliferating. I had no particular plan to make an intervention on this topic but it became a recurring topic. Every now and then, I would see people posting about it on the Wall, but I would still choose to let the community handle it on their own.

Lately, I’ve been cleaning up issues on GitHub. My main goal is to clean up as much as possible our issues before starting some small campaign to find new developers. We now have a much easier way to set up Tatoeba locally and I would like to put it under test.

So as I was cleaning up GitHub issues last weekend, I stumbled upon one issue that was pointing to a post in a thread that was proposing a solution to diversify names. It reminded me about another thread, that was posted more recently and that I only briefly saw, on the topic of using a small set of names. I decided to actually read it entirely.

And at that point, I just felt annoyed. I felt annoyed that after all this time, there was still no closure on this topic. I wanted to carry on cleaning up the GitHub issues, but I felt annoyed. Kind of like when you’re trying to focus on something, or worse, when you’re trying to sleep, and there’s this freaking mosquito/fly bzzz’ing around your head. That’s how annoyed I was.

This is a topic that already started a couple of years ago, and I understand that some topics are not easy. But by now, it was pretty clear to me that the wildcard idea was not a good idea. People gave good arguments against it. Still, it didn’t seem to be clear for everyone.

So I asked myself, what can I do? What can I do to put a stop to this madness?

Technically, I can do a lot of things. I can change the source code to reject every sentence containing “Tom” or “Mary”. Or I can automatically convert “Tom” or “Mary” into a random name upon saving. Or I could block everyone from contributing until I receive a private message saying “I understand I don’t have to use Tom or Mary in my sentences and I will not try to influence other people to do so”. Obviously, that’s not my type of approach.

I decided I would initiate a discussion with a radical question: should we stop Tom and Mary sentences? I already had the answer to “Should we enforce Tom and Mary sentences?”, but what about the opposite? I had no clear answer to that. But I gave it a try.

I tried to gather my best arguments on why we should stop Tom and Mary’s expansion, I drafted a proposal on what to do to stop Tom and Mary sentences, and I posted it on the Wall. I was quite convinced by my arguments and I really felt it would be a good thing to do. But knowing that this whole idea was way too recent to be a final decision, I made sure to not close my mind on it: to leave a door open for opposite points of view and not be carried away by my newfound convictions.

Now comes the tough part. Because while it may look like I’m trying to convince the world that we have to stop Tom and Mary, in my own head, I’m still battling against it. If all I had to provide was just a “yes” or a “no”, I could just flip a coin; it’s much less trouble. But the decision itself doesn’t matter. The “yes” or the “no” doesn’t matter. The reasons behind it, is what matters. I needed to find the strongest reasons, reasons that no one in their right mind could argue against.

Eventually, I found these reasons and they led me to decide against my own proposal. I did not find them directly from the conversation, that is, no one really said anything that was so well-argued and made so much sense that I changed my mind. But the fact that there was opposition to my proposal was a sign that I had to look further, to look deeper into the core values of Tatoeba. Even knowing that I can never make everyone happy, I still had to look.

In all of this, the real goal was not to find out what is the “best decision”. The real goal was to find common ground. And the common ground here was that diversity and redundancy are not exclusive from each other. Diversity can and will still grow in a corpus that looks very repetitive. It will take a lot of time, but patience is our most important virtue.

It all makes sense now but it wasn’t easy, I can tell you. It took several days of intense reflection, which probably totaled to 40 hours, if not more. 40 hours for thinking, for replying to people on the Wall and for formalizing the official decision. It took as much time as a full-time job and it heavily felt like one.

I will not do this again anytime soon, and hopefully, you understand why.

But that conversation we had on Tom and Mary was an important one. And there are many, many other important conversations that we need to have and will need to have. Who will be leading them, if not me?

Well, anyone else.

Find a problem, make a proposal on the Wall, discuss with the community, challenge everyone’s opinions (even your own), gather enough information and points of view until you can make a decision backed up with a rationale that no one can contest, then formalize the decision in a document.

Up to this point, it could be done by any member of Tatoeba. It doesn’t have to come from me. Anyone who has access to the Wall and access to a brain can do this.

The only steps where I would need to step in is to make it official and to help enforce the decision. You send me the document, I read it, it makes sense, I publish it on the blog, it becomes official and we follow through.

This is it.

It takes time, it takes skills, it takes a lot of empathy. I won’t deny it’s difficult, my story shows it. But all of that you can learn and practice. Please, don’t be afraid to give it a try. As long as you know the process, which I just described, you too can become a decision maker.

My dream is to see, one day, proposals being formalized without my intervention at all, and all I have to do (aside from being convinced) is to make them official. In the much longer term, my dream is to be able to transfer my authority to other people and let them officialize decisions instead of me, to no longer be the one and only almighty governor of Tatoeba.

I know it will take time but with the case of Tom and Mary, I hope I showed you the way, at least one way. I hope you spread the word. I hope you spread the love too. I think I did my part. The rest is up to you.

🙂

Should we stop sentences with Tom and Mary?

Context

A discussion was initiated on the Wall regarding the overwhelming amount of sentences containing “Tom” and “Mary”. The initial proposal was to ask the community to stop creating new sentences with “Tom” and “Mary”.

Wall thread: https://tatoeba.org/eng/wall/show_message/32296#message_32296

Verdict

To the question “Should we stop sentences with Tom and Mary?”, the official decision is: no, we should not. Contributors may continue creating sentences with “Tom” and “Mary”. No action will be taken against them.

As a general rule, no action will be taken against a contributor based on the sole fact that they are creating new sentences with a name that has been overused.

We will still take measures in regards of the underlying issue of diversity in our corpus:
  • We will make it clear in our documentation that people are free to use other names than Tom and Mary.
  • We will add on Tatoeba's contribution page a short text to encourage people to keep the corpus diverse.
  • We will create guidelines on how to contribute diverse sentences. These guidelines will be published in the wiki.

Rationale

We recognize that the content of our corpus has become unfulfilling for many of our users and we recognize that we need to make an effort to make it more diverse. However, after a thorough discussion with the community, I can conclude that attempting to make Tom and Mary "illegal" is not an adequate response to the problem.

In the same way that it has been said:
Whatever issues or inconveniences arise because of people using a more diverse set of names, we will solve them, but with another solution than enforcing wildcard names.
The same thing could be said from the other side:
Whatever issues or inconveniences arise because of people overusing "Tom" and "Mary", we will solve them, but with another solution than restricting these names.
Some of our contributors feel a certain attachment to Tom and Mary. It has become a comfort zone for them and if they do not wish to step out of their comfort zone, we should not force them to. Doing so would only generate a sense of loss of freedom and under these conditions, it is easy to develop uncooperative behavior or even try to cause more problems as a sign of protest.

We can obviously have the same issue on the other side: people might be leaving or causing problems out of disappointment that Tom and Mary sentences will continue to expand. But then we are just trading a bad situation against another bad situation and there is no way to evaluate which one really is worse than the other.

Restrictive measures may help us achieve our diversity goal faster, but such measures would be motivated by impatience. As long as Tatoeba welcomes people from all backgrounds and gives them the chance to express themselves in their most authentic ways, we will achieve this goal. The abundance and growth of Tom and Mary sentences does not eliminate the possibility for a diverse corpus. We will get there, that is inevitable. Whether it takes five years or fifty years, there is no rush.

Additional points

Tom and Mary, aka. wildcards, started out as an idea to reduce redundancy in the corpus. It has been demonstrated that this idea is inefficient. If you have been creating sentences with wildcards under the belief that it helps to prevent near-duplicate sentences, know that it can actually have the opposite effect. You may continue to create sentences with wildcards if you wish to, but you cannot claim that it is for the sake of reducing redundancy. It is misinformation at this point, and it is spreading an unnecessary fear of near-duplicates.

Near-duplicates are not a big deal. We need to make this clear. They are in fact necessary. They help to identify patterns. We encourage everyone to simply not worry about them and focus on being creative instead. Avoiding near-duplicates will come naturally: the more creative your sentence is, the less likely there will be a near-duplicate of it. This will also help with diversity.

Sunday, January 6, 2019

New Year, New Tatoeba

Happy New Year everyone :)

In a couple of weeks we will be releasing a new version of Tatoeba! The deployment is currently scheduled on January 19th. On the surface, you won’t be noticing any difference. Same look, same features (kind of). But there will be actually some major changes.

We’re handling a new license: Creative Commons Zero

It will be possible for Tatoeba contributors to choose between Creative Commons Attribution (CC-BY) and Creative Commons Zero (CC0) when submitting new sentences.

The difference between CC-BY and CC0:
  • With CC-BY anyone can reuse the data for any purpose, but is required to mention where they got the data from.
  • With CC0 there is no requirement at all, no need to say where the data comes from.
As a contributor, if you do not wish to use CC0, you will not have to. You can continue contributing as you used to, nothing will change for you. Your sentences will keep being released under CC-BY.

If you however want your contributions to be reused in other projects without any strings attached, then you’ll have the possibility to contribute new sentences under CC0, as well as switch the license of your existing sentences to CC0 under some conditions.

All of this will be detailed further once we deploy the new Tatoeba.

We’re migrating to CakePHP 3

Tatoeba is built on top of a framework called CakePHP. We’ve always been lagging behind, using much older versions than the latest available. The current website is still based on version 2.9, while version 3 was released almost four years ago. But we’ve finally been able to catch up and migrated our code to work on CakePHP 3.6.

There are still a few features to migrate, but we should be ready to deploy in two weeks!

For our non-tech-savvy users, this migration will perhaps feel like we went backwards. There will be nothing new, but there may be some features broken and there may be some features working slower than they used to. We will be fixing all of that within the following weeks, so please bear with us.

This migration was an important task for the longer term, for the same reasons than when we migrated from CakePHP 1 to CakePHP 2 a couple of years ago: there are various technical benefits and Tatoeba can now hopefully look more attractive for the developers out there who want to contribute to an open source project.

If you are one of these developers, we will be more than happy to welcome you onboard. Don’t be afraid to contact us.

We’re growing as an organisation

Looking back at when we had our “big crash” in 2017 and people were a bit worried about the state of Tatoeba, and looking at where we are now, Tatoeba has made a big step forward as an organisation.

Back then, Tatoeba was funded only with donations. These donations helped us paying for the server but we never made big campaigns and could not do much more with our money. Hiring staff was completely out of reach.

Thanks to Mozilla Open Source Support (MOSS), this has changed. We heard of the MOSS program after Mozilla Common Voice approached us to explore ways of collaboration. We applied for it and got accepted. We were awarded $25,000 and were able to hire our first employee.

This made a huge difference for us. Not only the integration of the CC0 license and the migration to CakePHP 3 were possible thanks to this award, but we were also able to fix many bugs and implement many improvements.

We will undoubtedly apply for MOSS again, but we will also look into other ways to get fundings. The next big goal would be to find a sustainable flow of income for the decades to come.

2018 was a pretty good year for us. Let's hope the trend continues in 2019 :)

Friday, June 1, 2018

Tatoeba's first employee

With the grant we are receiving from the Mozilla Open Source Support program, we are able to hire our very first employee!

If you're a veteran at Tatoeba, you surely know him: it's gillux.

He has contributed a lot to Tatoeba as a developer a couple years ago and he is now back, as official staff, starting today :)

Sunday, May 13, 2018

MOSS award for Tatoeba

I can finally share some big news with you. Tatoeba will be receiving $25,000 via the Mozilla Open Source Support (MOSS) program. This was a long process, but it's now finally official :)

A little bit of background story.

Back in October last year, folks from Mozilla got in touch with us to explore possible ways of collaboration. They're working on a project called Common Voice and with this project they basically want to collect people's voice. A lot of it.

To achieve this, they need sentences for people to read. Someone told them about Tatoeba... And that's how it started.

But it's not that simple.

One of the requirements of Common Voice is to be able to release their data under CC0 (the Creative Commons version of public domain). Tatoeba's data is CC-BY. Common Voice cannot reuse CC-BY sentences to record audio that they'll publish as CC-0. They can only reuse sentences that are in the public domain or CC0.

So there's quite some work to do there, if we want to let Common Voice reuse sentences from Tatoeba. This is what the MOSS award is for. We cannot change our CC-BY license for the data we've released so far. But we can evolve Tatoeba to handle more licenses than just CC-BY.

I'll be explaining more in details later on what changes we plan to do exactly. But until then, I would really like to have an idea where the Tatoeba community stands on this matter.

Would you consider putting part (or all) of your sentences under CC0? Why, or why not? Let me know via this form: https://goo.gl/forms/Nd6FcAoyd1zkfB4I2

Thursday, July 27, 2017

What's up with Tatoeba now?

It's been a month and a half since our SSD incident and while we managed to bring Tatoeba back online, there are still many features not working and the website is overall very slow to use.

I know many people wished the situation could improve faster, but as it stands now, we don't really have the manpower to get things done more quickly. Fixing everything and getting back to a stable situation will take a long time. Perhaps another couple of months. Perhaps more.

To be honest, the main reason is because I (as the founder of Tatoeba) am in a phase where I wouldn't want to dedicate more than a few hours per week.
There has been times where I could spend as much as a part time job working on Tatoeba (maybe even as much as a full time job?), and things were evolving at a fast pace. Then there has been times where I was completely absent and the project could not really move forward.
Right now, I'm on the low side, which is a big part of why Tatoeba is much slower to recover.

There's been a few questions asked on the Wall, regarding what can be done to improve the situation, and what can be done to keep Tatoeba healthy in the future. I'd like to answer them here to give people a clearer idea about how Tatoeba is functioning, and I'd like as well to give a few updates about what is being worked on at the moment.


1) https://tatoeba.org/eng/wall/show_message/28201#message_28201
Also, do you have any idea how much it would cost if tatoeba moved to a better web-hosting with better support? 2 weeks to restore functionality - that's a bit too much, and that's not the first time something like this happened to the site. 
I can contribute 10 euros every year, which is probably a drop in the ocean, but we could probably find 100 people like me among our active users.
It wouldn't cost a whole lot more to move to a web host with better support.

To be fair, the long time it took to restore Tatoeba was not entirely due to our host. They are not in charge of maintaining Tatoeba as a whole, they are only responsible of taking care of the machine where Tatoeba is hosted. It stopped working because the SSD died, and they were definitely quite slow to react (took 5 days of waiting before they replaced it), but it's not their fault that it took an extra week to restore the system and the data on the new SSD.

Still, we're definitely planning to move to another host. We have ordered a new server and hopefully will manage to move Tatoeba to a new home by the end of month.

Money is not an issue at least not for paying web hosting. We currently spend around 35€/month for our server, and even if we'd have to spend twice as much, we could still afford it without extra donations.


2) https://tatoeba.org/eng/wall/show_message/28243#message_28243
Is there any way to avoid a potential following breakdown or for that to save all the data?
Yes, there are ways.

Tatoeba is currently hosted on a dedicated server and if we move to a VPS (which is the plan), we would no longer have to worry about hardware failure.

Besides of that, our data recovery could have gone a lot better if we had invested more efforts on backups before the incident. We would have lost only one day of data had we checked our backups properly.

But keep in mind that securing the data is only a small part of the problems we have to solve. Tatoeba has grown into a complex system, which is becoming more and more challenging to maintain, the more features and content we put into it.

Which leads to the last question.


3) https://tatoeba.org/eng/wall/show_message/28262#message_28262
How is Tatoeba funded? It looks like the recent problems, and the current half-working status are the direct result of not having permanent staff (a.k.a money).
Tatoeba is funded with donations only. We have more than enough to pay for the server, we however don't have even closely enough to hire permanent staff. And indeed, not having permanent staff is a handicap.

With the way we are operating, when something's not working, it takes time before it gets noticed by someone who can do something about it, because we don't monitor Tatoeba 24/7. Then it takes time to solve the issue because everyone is a volunteer, and Tatoeba is just a side project for all of us. Problems can occur while we're at work, while we're traveling, while we're sick or just too tired to work on it. Some problems are actually quite difficult to solve.

We would need ideally a small team of 2 to 4 people working on Tatoeba at least as a part-time job, to ensure that Tatoeba keeps running smoothly at all times and continues to evolve in a sustainable way.

I don't think we can raise enough money for this via donations or crowdfunding. To be fair, I have never tried, so I could be wrong. But we're talking about 50k-100k euros per year, to secure a team. It's a completely different scale from what we're currently dealing with. Honestly, I don't have the "marketing" skills, nor would I have the energy, to raise this kind of money. If not me though, I'm not sure who else would do this.

But even if someone walked to us and threw millions at us, our issues won't magically disappear. It would still take time to build a team, to find the right people with the right skills to solve those issues.

While I would like Tatoeba to be as much as possible independent from money, I do think that one day or another, Tatoeba will need permanent staff, which is quite difficult to achieve without money. We can keep the project alive with volunteer staff, but we cannot make it grow much bigger than it is now. 

Anyway, this is more of a long term discussion.


In the shorter term, what's happening?

Currently, we're lucky to have pep (aka. Ppjet6 on Tatoeba) who stepped up to help on the whole sysadmin/devops part. He'll be the main person working on migrating Tatoeba to the new server.
If you have any knowledge in these areas and wish to get involved in a way or another, you're more than welcome to join the Tatoeba IRC channel (#tatoeba) on freenode.

In fact, you're welcome to join even if you're just gonna be lurking, or if you'd like to give some moral support. You can potentially learn a lot about how Tatoeba works by just hanging around in the IRC channel. And even if you won't get involved for time being, who knows, maybe in 6 months, in a year or in two years, Tatoeba will need you to save it from a dire death.

Also, you should be aware that there are other places than the Wall where you can gather and interact with other Tatoeba members. When Tatoeba is down, the Wall is down too, so it's important to have other channels of communications. Those are:
Feel free to use those external communication channels to discuss about Tatoeba. They are not restricted to developers only.

Last but not least, we now have a status page to communicate information about the status of Tatoeba (whether it's online, or down, or experiencing issues, or undergoing maintenance, etc): https://status.tatoeba.org
For now it's very minimal, but the idea is that if you're trying to access the Tatoeba website but can't, or if something doesn't work anymore on the website, you'll be able to find information about it on the status page.