Tatoeba Project

Sunday, July 17, 2011

Tatoeba update (Jul 20th, 2011)

This is a very small update.

What's new
  • You can filter your private messages to only display those that are unread. This one was added several weeks ago actually. And to be honest, I've added this feature mostly for myself. I have 20+ unread messages and some of them are from months ago. I usually leave them unread when it's a request I cannot take care of "right now", as a reminder that I need to take care of it someday...
  • The activity timeline page now only displays the number of sentences added for each day in the current month. You can however browse to see the activity for other months. That was in the attempt to make this page a little bit faster to display.
  • For people who use our data, there is a new file that you can download: sentences_detailed.csv. This file contains additional information about the sentence: the contributor who "owns" the sentence at the time of the export, the date when the sentence was added and the date when it was last modified.
What next
  • We will give the possibility to contributors to delete their own sentences, in the case where the sentence doesn't have any translation. This is mostly useful for people who add a sentence by mistake.

Sunday, May 29, 2011

Rules against bad behavior

A little explanation

These rules are not about the goal of the project (i.e. the corpus), but about something this project cannot exist without: the community.


A member can be a terrible contributor corpus wise, but a great contributor community wise, and conversely a great contributor corpus wise but a terrible contributor community wise. These are two very different aspects, but we need both kind of people.


We need people who have deep linguistic knowledge, who are able to contribute sentences and translations of good quality, to post relevant explanations or analysis of a sentences, a phrase, a word. And we need people who have good social skills, who are able to make other members feel like this is a nice place with nice people, and that they can have a great time contributing to the project.


I had established some detailed policies about what kind of content we accept and don’t accept. But there isn’t really much about what kind of behavior we tolerate and don’t tolerate. There’s the article about being disrespectful and using private messages rather than flaming, lecturing, criticizing someone in public. There is one point in the contributor’s guide about doing what you can to make Tatoeba a more socially pleasant place. But that’s pretty much all.


So here we go, some official policies...


The point of all of this is to avoid having people leaving the project because they feel disgusted by the behavior of other members.


Things we do not tolerate
  1. Insults: saying something offensive about someone or some people.

  2. Harassment: bothering someone repeatedly.

  3. Accusations: stating that someone is doing something with bad intentions.

  4. Blaming: saying that a problem is due to someone’s fault.

  5. Provocation: writing something that intentionally makes other people angry.

  6. Retaliation: replying to insults, harassment, accusations, blaming or provocation with something that doesn't help.

  7. Bad faith: lying, deceiving, being dishonest.

  8. Generally speaking, we do not tolerate any kind of behavior that harms the collaborative and civilized atmosphere of Tatoeba.

All of this is pretty obvious. What is less obvious is how we are going to decide that something is an insult, or harassment, or an accusation, etc. Initially, I had written here specific rules about insults and harassment (which I considered as the two most important types of bad behaviors), but I decided to replace them with more general ideas for the time being, because the rules have not proven to be efficient yet, and they weren’t complete either.


They will still remain published, because, well, we need to start somewhere. I must insist on the fact that they are not established. I actually encourage everyone who has enough time in their hands to try to hack these rules, and to try to counter-hack the hacks, and hopefully through this process we can come up with a better set of rules.


Sanctions

People who behave against the general policies will be subject to sanctions.


Depending on how Tatoeba evolves technically and depending on the situation, here a non exhaustive list of the possible sanctions:

  1. You may lose your right to post comments on sentences for a certain period of time.
  2. You may lose your right to post messages on the Wall for a certain period of time.
  3. You may lose your right to add new sentences for a certain period of time.
  4. You may lose your right to tag sentences for a certain period of time.
  5. Your profile description may be hidden from everyone else for a certain period of time.

The exact sanction and the period of time of the sanction will be decided specifically for each case. And as I said, the list is not exhaustive. You may receive another sanction than the one mentioned above.



A few considerations

Tatoeba currently doesn’t provide the possibility to edit comments, which makes it difficult to take care of messages where only one sentences is offensive but the rest is fine. It’s not very practical, but if you have offended someone and only need to remove that one sentence, you will have to send a message to TatoebaPeaceKeeper and indicate what you want your edited message to be. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t provide any kind of “ignore” feature, which makes it difficult for people who cannot stand each other to simply ignore each other. Well, you will have to leave without that luxury for now. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t have any internal mechanism to stop a flaming war before it’s too late (i.e. by preventing everyone to post a reply to a provocative comment with even more provocation). So what we will do is that we will list disrespectful comments on the TatoebaPeaceKeeper profile page, under the category “Dangerous territory”. If you’re going to reply to a comment that is under this category, make sure you are as neutral as possible. Or, you can join the dev team and code a feature for that.


Tatoeba currently doesn’t have any official “peace keeper” who monitors the activity every single second of the day and night, and take actions faster than lightning whenever a conflict is emerging. Which means you cannot expect your issues with other members to be heard and taken care of within the minute (perhaps not even within the week). But, you can always recommend us someone who can take these responsibilities, and convince them to become a peace keeper.


The rules I’ve written are not perfect. If you have better ideas, please suggest them. If not, please follow the rules.

Tuesday, May 17, 2011

New users status names

I decided to change the names of the users status.
  • user contributor
  • trusted user advanced contributor
  • moderator corpus maintainer
The reason for this change is that, to some extent, the previous names carried a too much "social weight", and I feel this is not the best way to go. I will write another post to talk more in details about the social and collaborative aspect of Tatoeba, but here I want to list all the existing status in Tatoeba and clarify what they mean.


Spammer
This status is used to flag account that were used to send spam.


Inactive
This status is used to flag account which users are no more active. Usually, these are users who decided to delete their account.


Contributor
This is the status everyone starts with when they register. They give access to the main contribution features of Tatoeba.


Advanced contributor
This status is given to users who have sufficiently contributed and are fairly familiar with the project. Advanced users currently have access to 2 extra features: they can link/unlink sentences, and they can tag sentences.
Generally speaking, if we implement a feature that is a bit tricky and experimental, we would make it available to advanced contributors first, before making it globally available.

This status is only given to users who accept it. We will not force this status upon anyone who prefers to remain a simple and modest contributor.
You don't need to wait for us to offer you the possibility to change status. Quite the contrary, we encourage you to ask for this status if you feel you can help us out on the linking and tagging front.

I will publish in a couple of weeks more specific details about the required conditions to become an advanced user.


Corpus maintainer
This status is given to advanced contributors who are willing to help with maintenance tasks.

Corpus maintainers were previously called "moderators", but they were not moderators in the usual context of a community. That is to say, Tatoeba moderators did NOT have the job to track users who do no behave well, they did NOT have the job to listen to users complaining about other users, they did NOT have the job to ban users for not behaving well, they did NOT have any kind of responsibility regarding the community.

Their responsibilities were, and still are strictly restricted to the corpus: to delete sentences that are added by mistake, to delete sentences that are added as spam, to delete sentences that are copyrighted, to edit incorrect sentences that were abandonned by their owner...
This is why this status has been renamed into "corpus maintainer".


Admin
Admins have the power to do pretty much everything, with all the responsibilities that go with it. Among other things, admins are the only ones who can change a user's status, which means that a contributor cannot become an advanced contributor or a corpus maintainer without the intervention of an admin.



Note that these status WILL evolve over time and may even disappear (in a distant future) to leave room to another (and hopefully better) kind of system. Right now, this is the best kind organization we can afford.

Saturday, May 7, 2011

Some tips for those who want to link sentences

In February, I've added a page that makes it easier to translate sentences of a specific user. For insstance, you can easily translate my sentences by going here. Or by going to my profile, click on "Sentences" (in the right-side column), and click on "Translate these sentences" (at the bottom of the right-side column).

In March, I've implemented an improvement of the "linking" feature. For those who have no idea what linking is about, please read the point #2 of the contributors guide.
So now, if you try to link a sentence, you will see that it only updates the line with the translation. It does NOT redirect anymore to a new page, and I think it makes linking much more comfortable.
I also made it possible for trusted users to link ANY sentences (not just the ones that belongs to them).

In April, I've implemented the possibility to filter the languages of the translations. If you go to your settings, and add "jpn,hun,swe" in the languages field, you will only see translations in Japanese, Hungarian and Swedish. You will still be able to view sentences in all the languages though, only the translations are filtered. And by the way, if you want to know what is the language code of a language, they are listed in the sentences statistics page.

So with all these features, if you are a trusted user and in the mood for massive linking, what I'd advise you to do is the following.
  1. Go to your settings and your languages in which you are able to link. This way, you will not be annoyed translations you don't understand in sentences that have 50+ translations.
  2. Browse your sentences in "translate" mode, and link anything you can link. Actually, you can even browse sentences of anyone you want, and link anything you can link.
  3. When you're done, you can go back to your settings and erase the languages, so that you can see again the translations in all languages.
Happy linking!

Sunday, May 1, 2011

Languages stats and leaders

It's been a long time I didn't publish stats, did I? But thanks to sysko who got us rid of the duplicates yesterday, I'm feeling a bit more comfortable talking about numbers.

Language ranking

I've decided to include the "leaders" for each language; it's an interesting information. It should give a good idea of who are the current most influential members in Tatoeba for each language.

NOTE 1: All the leaders are not necessarily references in the language they are leader of.
NOTE 2: The stats only list the languages that have more than 1000 sentences.

Meaning of the fields:
  • # rank of the language.
  • code → ISO 639-3 code corresponding to the language.
  • language → name of the language (in English).
  • total → total number of sentences in the language.
  • leader → username of the member who owns the most sentences in the language.
  • owns → number of sentences owned by the user in the language.
  • %owned → percentage of sentences owned by the user in the language (%owned = owns / total).

#codelanguagetotalleaderowns %owned
1engEnglish176232CK5433730,8%
2jpnJapanese154779fcbond14831,0%
3epoEsperanto78593GrizaLeono1369417,4%
4fraFrench68426sacredceltic1645024,0%
5deuGerman58485MUIRIEL1282421,9%
6spaSpanish37755Shishir913824,2%
7polPolish30856zipangu2269973,6%
8cmnMandarin Chinese27869fucongcong902232,4%
9rusRussian26757Hellerick660524,7%
10itaItalian19329Guybrush88823542,6%
11nldDutch18746martinod814043,4%
12ukrUkrainian15826aandrusiak438227,7%
13hunHungarian12656szaby78357428,2%
14pesPersian10280pliiganto381637,1%
15hebHebrew10118Eldad744873,6%
16porPortuguese9628brauliobezerra599462,3%
17araArabic7940saeb588674,1%
18islIcelandic7721Swift747296,8%
19turTurkish6434boracasli382159,4%
20ndsLow Saxon5753slomox549095,4%
21danDanish5032danepo462892,0%
22bulBulgarian4602ednorog402787,5%
23uigUyghur3747FeuDRenais326387,1%
24hinHindi3468minshirui346399,9%
25wuuShanghainese3257fucongcong161249,5%
26vieVietnamese2987autuno187662,8%
27belBelarusian2158Demetrius208996,8%
28tlhKlingon2104Vortarulo209799,7%
29jboLojban2017Zifre117958,5%
30yueCantonese1930nickyeow176791,6%
31nobNorwegian (Bokmål)1872contour109058,2%
32finFinnish1585Hautis50231,7%
33inaInterlingua1537McDutchie149497,2%
34sweSwedish1190Don71960,4%


Progress

Let's see how the corpus has progressed since last time...
  • We've reached our 800,000+ milestone; we're now at 834,000+ sentences.
  • The top 5 is still the same.
  • Persian and Hebrew joined the 10,000+ family!
  • Low Saxon, Bulgarian, Klingon, Finnish and Interlingua joined the 1000+ family!
At this rate, we should reach our 1 million milestone some time around September :)

Saturday, April 30, 2011

Who wants to help?

On Monday this week we’ve seen a pretty strong wave of new visitors from Spain, via meneame.net. We’ve never had that many (7100+) visits in one day and it made me feel like it’s time for Tatoeba to get a better organization and more people involved. Like, really.

Allan (aka. sysko) and myself (aka. Trang) have been saying for too long that we don’t have time. With our respective busy lives and with the growing community, we cannot take care anymore of many small or not so small requests made by users. But we’re not the only ones who can make this project more awesome, you can too!


What can you do?

I’m listing below the various “departments” of Tatoeba, with a general description. I will be posting more specific tasks whenever needed, but the general descriptions should give you an idea of all the things you can do if you enjoy this project and feel like being a bigger part of it.

Corpus
This is what everybody works on when they join Tatoeba: adding, translating and correcting sentences. Generally speaking, we’ve got the “sentences” part going on pretty well. We would just need more moderators (ideally, at least one moderator for each language).
However the corpus is not just about sentences. It is also about links, tags and audio. And for these, we don’t have a very good system yet, but to improve that, we’d need people to help in development...

Development
This is about programming. There’s a huuuuge amount of work to do here. We have plenty of ideas to code but also a lot of maintenance to do... So we’re going to need more people in the dev team.
Sysko started working on the next version of Tatoeba (in C++) and we’ll have to wait patiently until he’s ready to bring more people in. In the meantime, I am maintaining and improving whatever I can in the current version (in PHP).
If you’re interested in joining us, you’ll be mostly helping me with the current version, but I really hope you’ll stick around long enough to be part of the new version as well.

Documentation
This is about writing articles (or making videos) that explain what the project is about, how it works, what are the policies, what are the procedures if you want to do whatever you want to do, etc. Basically, the documentation is the place where people can go to search (and hopefully find) answers to questions they ask themselves about the project.
Documentation is extremely important. Without it, it’s difficult to get more people involved and quite unfortunately, we have way too little documentation.

Community
This is about taking care of the social aspect of Tatoeba, and maintaining a good karma around the project. It’s about making people feel welcomed, helping them understand better the project, encouraging and thanking them for their good work, getting them to calm down if they get involved in conflicts, contact those who would be good trusted users candidates, and more.

Translation
This is about translating the interface, documentation and news.
The news is currently not translated at all and only parts of the documentation have been translated into a handful of languages. We'd really like to improve on this front to make the project accessible to as many as possible.
We're doing better on the interface, which has been translated into many languages through Launchpad. Still, the translations are not all perfect and ideally we would like each language to have one person overseeing it to ensure the overall quality.

Design
I’m talking about graphical design here. This is about making Tatoeba prettier. The interface, the icons, the illustrations, the videos, the goodies... We need designers to... well, design these things. Great design is not vital, but I personally think it makes people happier. It’s much more pleasant to contribute on a platform with a nice interface, it’s much more pleasant to read or watch a tutorial if it’s illustrated with nice graphics.

Tests
This is about making sure that Tatoeba works as it should work, especially when we implement new features. Testing is not going to be super urgent until we have a bigger dev team though.

Technical support
This is about helping people who can’t get things to work properly. When users ask for help on the Wall, the whole community can help. But we also sometimes receive emails asking for technical support, and it would be nice to have a person (other than sysko or myself) who can dedicate time answering to these emails.

Communication
This is about about informing the community of things they may be interested to know. For instance writing a release note whenever we’re introducing new features, publishing statistics, announcing new policies or important decisions.

Events
This is about organizing special activities on Tatoeba. For instance we organize a Tatoeba day every month (or almost... we had none in April though because I was way too busy this month). We’ve also tried organizing a contest (for the banners), and it would be nice to organize some more.
Events can bring fun into the project, but finding ideas, planning them out and motivating people to participate is quite some work.


Interested in helping out?

If you are interested in helping us, then here’s what to do:
  1. Send us an email: team@tatoeba.org.
  2. Use the following title: I want to help ([category], [category], ...). Ex: I want to help (documentation, tests, news).
  3. Tell us a little about yourself in your email and what kind of tasks you feel ready to take on.
  4. We will contact you back and assign you more specific tasks and give you any information you may beed to know.
  5. Update your profile accordingly (and regularly) to let the rest of the community know what you’re working on.

Sunday, April 17, 2011

Tatoeba banners

Until I find an appropriate space on Tatoeba's website, I'm publishing here on the blog the final banners for Tatoeba.
You are encouraged to use these images if you want to make a graphical link to Tatoeba (from your blog or website). You have the choice between a big version (392x72 pixels) and a small version (88x31 pixels) in various languages.

They were kindly made by Muiriel, who won the mini-contest we organized back in January-February. She modified her initial submission into something better. She would like to thank CK for the idea of using the ".org" in the big version.

If you would like to have a banner in a language that is not present below...
  • Send her the translations of "more than words." and of "...because a language is more than the sum of its words." into your desired language.
  • Tell her where to put the new line for the text of the big banner. For instance, in English, the line break is after "more":
...because a language is more
than the sum of its words

PS: Some languages don't have (yet?) a "big" version or a "small" version. This is normal.



[ara] Arabic

araara


[bel] Belarusian

belbel


[cmn] Mandarin Chinese

cmn


[deu] German

deudeu


[eng] English

engeng


[epo] Esperanto

epoepo


[fra] French

frafra


[ina] Interlingua

inaina


[ita] Italian

itaita


[jbo] Lojban

jbo


[nld] Dutch

nldnld


[por] Portuguese

porpor


[rus] Russian

rusrus


[spa] Spanish

spaspa


[tur] Turkish

turtur


[ukr] Ukrainian

ukr