Sunday, May 29, 2011

Rules against bad behavior

A little explanation

These rules are not about the goal of the project (i.e. the corpus), but about something this project cannot exist without: the community.


A member can be a terrible contributor corpus wise, but a great contributor community wise, and conversely a great contributor corpus wise but a terrible contributor community wise. These are two very different aspects, but we need both kind of people.


We need people who have deep linguistic knowledge, who are able to contribute sentences and translations of good quality, to post relevant explanations or analysis of a sentences, a phrase, a word. And we need people who have good social skills, who are able to make other members feel like this is a nice place with nice people, and that they can have a great time contributing to the project.


I had established some detailed policies about what kind of content we accept and don’t accept. But there isn’t really much about what kind of behavior we tolerate and don’t tolerate. There’s the article about being disrespectful and using private messages rather than flaming, lecturing, criticizing someone in public. There is one point in the contributor’s guide about doing what you can to make Tatoeba a more socially pleasant place. But that’s pretty much all.


So here we go, some official policies...


The point of all of this is to avoid having people leaving the project because they feel disgusted by the behavior of other members.


Things we do not tolerate
  1. Insults: saying something offensive about someone or some people.

  2. Harassment: bothering someone repeatedly.

  3. Accusations: stating that someone is doing something with bad intentions.

  4. Blaming: saying that a problem is due to someone’s fault.

  5. Provocation: writing something that intentionally makes other people angry.

  6. Retaliation: replying to insults, harassment, accusations, blaming or provocation with something that doesn't help.

  7. Bad faith: lying, deceiving, being dishonest.

  8. Generally speaking, we do not tolerate any kind of behavior that harms the collaborative and civilized atmosphere of Tatoeba.

All of this is pretty obvious. What is less obvious is how we are going to decide that something is an insult, or harassment, or an accusation, etc. Initially, I had written here specific rules about insults and harassment (which I considered as the two most important types of bad behaviors), but I decided to replace them with more general ideas for the time being, because the rules have not proven to be efficient yet, and they weren’t complete either.


They will still remain published, because, well, we need to start somewhere. I must insist on the fact that they are not established. I actually encourage everyone who has enough time in their hands to try to hack these rules, and to try to counter-hack the hacks, and hopefully through this process we can come up with a better set of rules.


Sanctions

People who behave against the general policies will be subject to sanctions.


Depending on how Tatoeba evolves technically and depending on the situation, here a non exhaustive list of the possible sanctions:

  1. You may lose your right to post comments on sentences for a certain period of time.
  2. You may lose your right to post messages on the Wall for a certain period of time.
  3. You may lose your right to add new sentences for a certain period of time.
  4. You may lose your right to tag sentences for a certain period of time.
  5. Your profile description may be hidden from everyone else for a certain period of time.

The exact sanction and the period of time of the sanction will be decided specifically for each case. And as I said, the list is not exhaustive. You may receive another sanction than the one mentioned above.



A few considerations

Tatoeba currently doesn’t provide the possibility to edit comments, which makes it difficult to take care of messages where only one sentences is offensive but the rest is fine. It’s not very practical, but if you have offended someone and only need to remove that one sentence, you will have to send a message to TatoebaPeaceKeeper and indicate what you want your edited message to be. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t provide any kind of “ignore” feature, which makes it difficult for people who cannot stand each other to simply ignore each other. Well, you will have to leave without that luxury for now. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t have any internal mechanism to stop a flaming war before it’s too late (i.e. by preventing everyone to post a reply to a provocative comment with even more provocation). So what we will do is that we will list disrespectful comments on the TatoebaPeaceKeeper profile page, under the category “Dangerous territory”. If you’re going to reply to a comment that is under this category, make sure you are as neutral as possible. Or, you can join the dev team and code a feature for that.


Tatoeba currently doesn’t have any official “peace keeper” who monitors the activity every single second of the day and night, and take actions faster than lightning whenever a conflict is emerging. Which means you cannot expect your issues with other members to be heard and taken care of within the minute (perhaps not even within the week). But, you can always recommend us someone who can take these responsibilities, and convince them to become a peace keeper.


The rules I’ve written are not perfect. If you have better ideas, please suggest them. If not, please follow the rules.

Tuesday, May 17, 2011

New users status names

I decided to change the names of the users status.
  • user contributor
  • trusted user advanced contributor
  • moderator corpus maintainer
The reason for this change is that, to some extent, the previous names carried a too much "social weight", and I feel this is not the best way to go. I will write another post to talk more in details about the social and collaborative aspect of Tatoeba, but here I want to list all the existing status in Tatoeba and clarify what they mean.


Spammer
This status is used to flag account that were used to send spam.


Inactive
This status is used to flag account which users are no more active. Usually, these are users who decided to delete their account.


Contributor
This is the status everyone starts with when they register. They give access to the main contribution features of Tatoeba.


Advanced contributor
This status is given to users who have sufficiently contributed and are fairly familiar with the project. Advanced contributors currently have access to 2 extra features: they can link/unlink sentences, and they can tag sentences.
Generally speaking, if we implement a feature that is a bit tricky and experimental, we would make it available to advanced contributors first, before making it globally available.

This status is only given to users who accept it. We will not force this status upon anyone who prefers to remain a simple and modest contributor.
You don't need to wait for us to offer you the possibility to change status. Quite the contrary, we encourage you to ask for this status if you feel you can help us out on the linking and tagging front.


Corpus maintainer
This status is given to advanced contributors who are willing to help with maintenance tasks.

Corpus maintainers were previously called "moderators", but they were not moderators in the usual context of a community. That is to say, Tatoeba moderators did NOT have the job to track users who do no behave well, they did NOT have the job to listen to users complaining about other users, they did NOT have the job to ban users for not behaving well, they did NOT have any kind of responsibility regarding the community.

Their responsibilities were, and still are strictly restricted to the corpus: to delete sentences that are added by mistake, to delete sentences that are added as spam, to delete sentences that are copyrighted, to edit incorrect sentences that were abandonned by their owner...
This is why this status has been renamed into "corpus maintainer".


Admin
Admins have the power to do pretty much everything, with all the responsibilities that go with it. Among other things, admins are the only ones who can change a user's status, which means that a contributor cannot become an advanced contributor or a corpus maintainer without the intervention of an admin.



Note that these status WILL evolve over time and may even disappear (in a distant future) to leave room to another (and hopefully better) kind of system. Right now, this is the best kind organization we can afford.

Saturday, May 7, 2011

Some tips for those who want to link sentences

In February, I've added a page that makes it easier to translate sentences of a specific user. For insstance, you can easily translate my sentences by going here. Or by going to my profile, click on "Sentences" (in the right-side column), and click on "Translate these sentences" (at the bottom of the right-side column).

In March, I've implemented an improvement of the "linking" feature. For those who have no idea what linking is about, please read the point #2 of the contributors guide.
So now, if you try to link a sentence, you will see that it only updates the line with the translation. It does NOT redirect anymore to a new page, and I think it makes linking much more comfortable.
I also made it possible for trusted users to link ANY sentences (not just the ones that belongs to them).

In April, I've implemented the possibility to filter the languages of the translations. If you go to your settings, and add "jpn,hun,swe" in the languages field, you will only see translations in Japanese, Hungarian and Swedish. You will still be able to view sentences in all the languages though, only the translations are filtered. And by the way, if you want to know what is the language code of a language, they are listed in the sentences statistics page.

So with all these features, if you are a trusted user and in the mood for massive linking, what I'd advise you to do is the following.
  1. Go to your settings and your languages in which you are able to link. This way, you will not be annoyed translations you don't understand in sentences that have 50+ translations.
  2. Browse your sentences in "translate" mode, and link anything you can link. Actually, you can even browse sentences of anyone you want, and link anything you can link.
  3. When you're done, you can go back to your settings and erase the languages, so that you can see again the translations in all languages.
Happy linking!

Sunday, May 1, 2011

Languages stats and leaders

It's been a long time I didn't publish stats, did I? But thanks to sysko who got us rid of the duplicates yesterday, I'm feeling a bit more comfortable talking about numbers.

Language ranking

I've decided to include the "leaders" for each language; it's an interesting information. It should give a good idea of who are the current most influential members in Tatoeba for each language.

NOTE 1: All the leaders are not necessarily references in the language they are leader of.
NOTE 2: The stats only list the languages that have more than 1000 sentences.

Meaning of the fields:
  • # rank of the language.
  • code → ISO 639-3 code corresponding to the language.
  • language → name of the language (in English).
  • total → total number of sentences in the language.
  • leader → username of the member who owns the most sentences in the language.
  • owns → number of sentences owned by the user in the language.
  • %owned → percentage of sentences owned by the user in the language (%owned = owns / total).

#codelanguagetotalleaderowns %owned
1engEnglish176232CK5433730,8%
2jpnJapanese154779fcbond14831,0%
3epoEsperanto78593GrizaLeono1369417,4%
4fraFrench68426sacredceltic1645024,0%
5deuGerman58485MUIRIEL1282421,9%
6spaSpanish37755Shishir913824,2%
7polPolish30856zipangu2269973,6%
8cmnMandarin Chinese27869fucongcong902232,4%
9rusRussian26757Hellerick660524,7%
10itaItalian19329Guybrush88823542,6%
11nldDutch18746martinod814043,4%
12ukrUkrainian15826aandrusiak438227,7%
13hunHungarian12656szaby78357428,2%
14pesPersian10280pliiganto381637,1%
15hebHebrew10118Eldad744873,6%
16porPortuguese9628brauliobezerra599462,3%
17araArabic7940saeb588674,1%
18islIcelandic7721Swift747296,8%
19turTurkish6434boracasli382159,4%
20ndsLow Saxon5753slomox549095,4%
21danDanish5032danepo462892,0%
22bulBulgarian4602ednorog402787,5%
23uigUyghur3747FeuDRenais326387,1%
24hinHindi3468minshirui346399,9%
25wuuShanghainese3257fucongcong161249,5%
26vieVietnamese2987autuno187662,8%
27belBelarusian2158Demetrius208996,8%
28tlhKlingon2104Vortarulo209799,7%
29jboLojban2017Zifre117958,5%
30yueCantonese1930nickyeow176791,6%
31nobNorwegian (Bokmål)1872contour109058,2%
32finFinnish1585Hautis50231,7%
33inaInterlingua1537McDutchie149497,2%
34sweSwedish1190Don71960,4%


Progress

Let's see how the corpus has progressed since last time...
  • We've reached our 800,000+ milestone; we're now at 834,000+ sentences.
  • The top 5 is still the same.
  • Persian and Hebrew joined the 10,000+ family!
  • Low Saxon, Bulgarian, Klingon, Finnish and Interlingua joined the 1000+ family!
At this rate, we should reach our 1 million milestone some time around September :)