Showing posts with label documentation. Show all posts
Showing posts with label documentation. Show all posts

Sunday, May 29, 2011

Rules against bad behavior

A little explanation

These rules are not about the goal of the project (i.e. the corpus), but about something this project cannot exist without: the community.


A member can be a terrible contributor corpus wise, but a great contributor community wise, and conversely a great contributor corpus wise but a terrible contributor community wise. These are two very different aspects, but we need both kind of people.


We need people who have deep linguistic knowledge, who are able to contribute sentences and translations of good quality, to post relevant explanations or analysis of a sentences, a phrase, a word. And we need people who have good social skills, who are able to make other members feel like this is a nice place with nice people, and that they can have a great time contributing to the project.


I had established some detailed policies about what kind of content we accept and don’t accept. But there isn’t really much about what kind of behavior we tolerate and don’t tolerate. There’s the article about being disrespectful and using private messages rather than flaming, lecturing, criticizing someone in public. There is one point in the contributor’s guide about doing what you can to make Tatoeba a more socially pleasant place. But that’s pretty much all.


So here we go, some official policies...


The point of all of this is to avoid having people leaving the project because they feel disgusted by the behavior of other members.


Things we do not tolerate
  1. Insults: saying something offensive about someone or some people.

  2. Harassment: bothering someone repeatedly.

  3. Accusations: stating that someone is doing something with bad intentions.

  4. Blaming: saying that a problem is due to someone’s fault.

  5. Provocation: writing something that intentionally makes other people angry.

  6. Retaliation: replying to insults, harassment, accusations, blaming or provocation with something that doesn't help.

  7. Bad faith: lying, deceiving, being dishonest.

  8. Generally speaking, we do not tolerate any kind of behavior that harms the collaborative and civilized atmosphere of Tatoeba.

All of this is pretty obvious. What is less obvious is how we are going to decide that something is an insult, or harassment, or an accusation, etc. Initially, I had written here specific rules about insults and harassment (which I considered as the two most important types of bad behaviors), but I decided to replace them with more general ideas for the time being, because the rules have not proven to be efficient yet, and they weren’t complete either.


They will still remain published, because, well, we need to start somewhere. I must insist on the fact that they are not established. I actually encourage everyone who has enough time in their hands to try to hack these rules, and to try to counter-hack the hacks, and hopefully through this process we can come up with a better set of rules.


Sanctions

People who behave against the general policies will be subject to sanctions.


Depending on how Tatoeba evolves technically and depending on the situation, here a non exhaustive list of the possible sanctions:

  1. You may lose your right to post comments on sentences for a certain period of time.
  2. You may lose your right to post messages on the Wall for a certain period of time.
  3. You may lose your right to add new sentences for a certain period of time.
  4. You may lose your right to tag sentences for a certain period of time.
  5. Your profile description may be hidden from everyone else for a certain period of time.

The exact sanction and the period of time of the sanction will be decided specifically for each case. And as I said, the list is not exhaustive. You may receive another sanction than the one mentioned above.



A few considerations

Tatoeba currently doesn’t provide the possibility to edit comments, which makes it difficult to take care of messages where only one sentences is offensive but the rest is fine. It’s not very practical, but if you have offended someone and only need to remove that one sentence, you will have to send a message to TatoebaPeaceKeeper and indicate what you want your edited message to be. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t provide any kind of “ignore” feature, which makes it difficult for people who cannot stand each other to simply ignore each other. Well, you will have to leave without that luxury for now. Or, you can join the dev team and code that feature.


Tatoeba currently doesn’t have any internal mechanism to stop a flaming war before it’s too late (i.e. by preventing everyone to post a reply to a provocative comment with even more provocation). So what we will do is that we will list disrespectful comments on the TatoebaPeaceKeeper profile page, under the category “Dangerous territory”. If you’re going to reply to a comment that is under this category, make sure you are as neutral as possible. Or, you can join the dev team and code a feature for that.


Tatoeba currently doesn’t have any official “peace keeper” who monitors the activity every single second of the day and night, and take actions faster than lightning whenever a conflict is emerging. Which means you cannot expect your issues with other members to be heard and taken care of within the minute (perhaps not even within the week). But, you can always recommend us someone who can take these responsibilities, and convince them to become a peace keeper.


The rules I’ve written are not perfect. If you have better ideas, please suggest them. If not, please follow the rules.

Tuesday, May 17, 2011

New users status names

I decided to change the names of the users status.
  • user contributor
  • trusted user advanced contributor
  • moderator corpus maintainer
The reason for this change is that, to some extent, the previous names carried a too much "social weight", and I feel this is not the best way to go. I will write another post to talk more in details about the social and collaborative aspect of Tatoeba, but here I want to list all the existing status in Tatoeba and clarify what they mean.


Spammer
This status is used to flag account that were used to send spam.


Inactive
This status is used to flag account which users are no more active. Usually, these are users who decided to delete their account.


Contributor
This is the status everyone starts with when they register. They give access to the main contribution features of Tatoeba.


Advanced contributor
This status is given to users who have sufficiently contributed and are fairly familiar with the project. Advanced contributors currently have access to 2 extra features: they can link/unlink sentences, and they can tag sentences.
Generally speaking, if we implement a feature that is a bit tricky and experimental, we would make it available to advanced contributors first, before making it globally available.

This status is only given to users who accept it. We will not force this status upon anyone who prefers to remain a simple and modest contributor.
You don't need to wait for us to offer you the possibility to change status. Quite the contrary, we encourage you to ask for this status if you feel you can help us out on the linking and tagging front.


Corpus maintainer
This status is given to advanced contributors who are willing to help with maintenance tasks.

Corpus maintainers were previously called "moderators", but they were not moderators in the usual context of a community. That is to say, Tatoeba moderators did NOT have the job to track users who do no behave well, they did NOT have the job to listen to users complaining about other users, they did NOT have the job to ban users for not behaving well, they did NOT have any kind of responsibility regarding the community.

Their responsibilities were, and still are strictly restricted to the corpus: to delete sentences that are added by mistake, to delete sentences that are added as spam, to delete sentences that are copyrighted, to edit incorrect sentences that were abandonned by their owner...
This is why this status has been renamed into "corpus maintainer".


Admin
Admins have the power to do pretty much everything, with all the responsibilities that go with it. Among other things, admins are the only ones who can change a user's status, which means that a contributor cannot become an advanced contributor or a corpus maintainer without the intervention of an admin.



Note that these status WILL evolve over time and may even disappear (in a distant future) to leave room to another (and hopefully better) kind of system. Right now, this is the best kind organization we can afford.

Saturday, May 7, 2011

Some tips for those who want to link sentences

In February, I've added a page that makes it easier to translate sentences of a specific user. For insstance, you can easily translate my sentences by going here. Or by going to my profile, click on "Sentences" (in the right-side column), and click on "Translate these sentences" (at the bottom of the right-side column).

In March, I've implemented an improvement of the "linking" feature. For those who have no idea what linking is about, please read the point #2 of the contributors guide.
So now, if you try to link a sentence, you will see that it only updates the line with the translation. It does NOT redirect anymore to a new page, and I think it makes linking much more comfortable.
I also made it possible for trusted users to link ANY sentences (not just the ones that belongs to them).

In April, I've implemented the possibility to filter the languages of the translations. If you go to your settings, and add "jpn,hun,swe" in the languages field, you will only see translations in Japanese, Hungarian and Swedish. You will still be able to view sentences in all the languages though, only the translations are filtered. And by the way, if you want to know what is the language code of a language, they are listed in the sentences statistics page.

So with all these features, if you are a trusted user and in the mood for massive linking, what I'd advise you to do is the following.
  1. Go to your settings and your languages in which you are able to link. This way, you will not be annoyed translations you don't understand in sentences that have 50+ translations.
  2. Browse your sentences in "translate" mode, and link anything you can link. Actually, you can even browse sentences of anyone you want, and link anything you can link.
  3. When you're done, you can go back to your settings and erase the languages, so that you can see again the translations in all languages.
Happy linking!

Saturday, April 30, 2011

Who wants to help?

On Monday this week we’ve seen a pretty strong wave of new visitors from Spain, via meneame.net. We’ve never had that many (7100+) visits in one day and it made me feel like it’s time for Tatoeba to get a better organization and more people involved. Like, really.

Allan (aka. sysko) and myself (aka. Trang) have been saying for too long that we don’t have time. With our respective busy lives and with the growing community, we cannot take care anymore of many small or not so small requests made by users. But we’re not the only ones who can make this project more awesome, you can too!


What can you do?

I’m listing below the various “departments” of Tatoeba, with a general description. I will be posting more specific tasks whenever needed, but the general descriptions should give you an idea of all the things you can do if you enjoy this project and feel like being a bigger part of it.

Corpus
This is what everybody works on when they join Tatoeba: adding, translating and correcting sentences. Generally speaking, we’ve got the “sentences” part going on pretty well. We would just need more moderators (ideally, at least one moderator for each language).
However the corpus is not just about sentences. It is also about links, tags and audio. And for these, we don’t have a very good system yet, but to improve that, we’d need people to help in development...

Development
This is about programming. There’s a huuuuge amount of work to do here. We have plenty of ideas to code but also a lot of maintenance to do... So we’re going to need more people in the dev team.
Sysko started working on the next version of Tatoeba (in C++) and we’ll have to wait patiently until he’s ready to bring more people in. In the meantime, I am maintaining and improving whatever I can in the current version (in PHP).
If you’re interested in joining us, you’ll be mostly helping me with the current version, but I really hope you’ll stick around long enough to be part of the new version as well.

Documentation
This is about writing articles (or making videos) that explain what the project is about, how it works, what are the policies, what are the procedures if you want to do whatever you want to do, etc. Basically, the documentation is the place where people can go to search (and hopefully find) answers to questions they ask themselves about the project.
Documentation is extremely important. Without it, it’s difficult to get more people involved and quite unfortunately, we have way too little documentation.

Community
This is about taking care of the social aspect of Tatoeba, and maintaining a good karma around the project. It’s about making people feel welcomed, helping them understand better the project, encouraging and thanking them for their good work, getting them to calm down if they get involved in conflicts, contact those who would be good trusted users candidates, and more.

Translation
This is about translating the interface, documentation and news.
The news is currently not translated at all and only parts of the documentation have been translated into a handful of languages. We'd really like to improve on this front to make the project accessible to as many as possible.
We're doing better on the interface, which has been translated into many languages through Launchpad. Still, the translations are not all perfect and ideally we would like each language to have one person overseeing it to ensure the overall quality.

Design
I’m talking about graphical design here. This is about making Tatoeba prettier. The interface, the icons, the illustrations, the videos, the goodies... We need designers to... well, design these things. Great design is not vital, but I personally think it makes people happier. It’s much more pleasant to contribute on a platform with a nice interface, it’s much more pleasant to read or watch a tutorial if it’s illustrated with nice graphics.

Tests
This is about making sure that Tatoeba works as it should work, especially when we implement new features. Testing is not going to be super urgent until we have a bigger dev team though.

Technical support
This is about helping people who can’t get things to work properly. When users ask for help on the Wall, the whole community can help. But we also sometimes receive emails asking for technical support, and it would be nice to have a person (other than sysko or myself) who can dedicate time answering to these emails.

Communication
This is about about informing the community of things they may be interested to know. For instance writing a release note whenever we’re introducing new features, publishing statistics, announcing new policies or important decisions.

Events
This is about organizing special activities on Tatoeba. For instance we organize a Tatoeba day every month (or almost... we had none in April though because I was way too busy this month). We’ve also tried organizing a contest (for the banners), and it would be nice to organize some more.
Events can bring fun into the project, but finding ideas, planning them out and motivating people to participate is quite some work.


Interested in helping out?

If you are interested in helping us, then here’s what to do:
  1. Send us an email: team@tatoeba.org.
  2. Use the following title: I want to help ([category], [category], ...). Ex: I want to help (documentation, tests, news).
  3. Tell us a little about yourself in your email and what kind of tasks you feel ready to take on.
  4. We will contact you back and assign you more specific tasks and give you any information you may beed to know.
  5. Update your profile accordingly (and regularly) to let the rest of the community know what you’re working on.

Tuesday, January 25, 2011

Legally valid content

This article aims to give general instructions on how to contribute legally valid content in Tatoeba, to minimize the risk of Tatoeba being shut down for having illegal content (not saying it will be happening anytime soon, but better be safe).

If there is one thing you will need to remember, it is this: do not add non CC-BY sentences in Tatoeba.


Non CC-BY sentences

Perhaps "non CC-BY sentence" is a bit cryptic for some of you so let me clarify what it means. CC-BY is a short name for the Creative Commons Attribution license. Tatoeba redistributes all its sentences under this license. A non CC-BY sentence is simply a sentence that is not compatible with the CC-BY license.
  • Anything that is under copyright is NOT compatible with CC-BY (that includes quotes from books, movies, songs...).
  • Anything that is under a license that has a "share alike" condition is NOT compatible with CC-BY. CC-BY-SA is not compatible with CC-BY. That means you can't copy text from Wikipedia into Tatoeba. But CC-BY is compatible with CC-BY-SA, so you may insert sentences from Tatoeba in Wikipedia, or Wikiquote for instance.
  • Anything that is under a license that has a "no commercial use" condition is NOT compatible with CC-BY.
  • Anything that is not under any license is not NOT compatible with CC-BY. If there's no license, it means by default that the author doesn't authorize re-use.
  • Anything that basically doesn't say "You can do absolutely whatever you want with this as long as you" is NOT compatible with CC-BY. Update: this last statement was an over-simplification. This is has caused confusion so I'm removing it.


CC-BY sentences

But now you may wonder, what IS compatible with the CC-BY license?
  • Anything that is under CC-BY is compatible with CC-BY. Sentences that you add in Tatoeba and that were created by yourself are under CC-BY, because you agreed with the Terms of Use.
  • Anything that is in the public domain is compatible with CC-BY. If the author of a book was dead 100 years ago, then you can pretty much safely consider that the book is the public domain.
  • Anything that basically says "You can do absolutely whatever you want with this" should be compatible with CC-BY.


The basic rules to contribute legal content

1) If you want to be sure that your sentences are legally valid, do NOT copy-paste from anywhere (especially NOT from textbooks, electronic dictionaries, or other language learning websites), only come up with your own sentences.

2) We delete non CC-BY sentences. Depending on the situation, we may either delete the sentence right away, or give the contributor a delay to defend their sentence.

3) Do NOT translate a sentence that you think is non CC-BY. Instead, post a comment to express your doubts about the legal status of the sentence. If you are a trusted user, add the tag "@possibly non CC-BY". If you see other people adding or translating non CC-BY content, tell them NOT to do that.

4) If you do copy-paste from somewhere else, indicate in the comments where you copy-paste from. Give all the information you can so that we can easily find out it is indeed CC-BY compatible.

5) We will block a user's possibility to contribute (add, translate, edit sentences) if they are not following these rules.

6) To be honest, it can happen that we delete sentences that are legally valid, because the limit between legal vs non-legal is not always clear. If you are a specialist about these legal issues, please help us define a clear method to determine whether a sentence is legally valid or not.


Related links

Here's a bunch links related to copyright and stuff. I'm just throwing them here for those who are interested in expanding their knowledge on the matter. Wikipedia obviously has a lot of information on the subject since they have to deal with the problem certainly more often than any other collaborative project out there.

Sunday, November 7, 2010

Tags guidelines

We have introduced the "tags" feature several months ago and we've let trusted users experiment it pretty much freely. There has been a profusion of tags created but they are quite a mess and we decided to try tidying up.

From now on, if you are going to tag a sentence, please take into consideration the following things.


1. Use tags for objective and official information

We would like to keep the tags for "objective" and "official" information. If you want to categorize sentences for personal purpose, you should use lists.

For instance, you cannot tag a sentence "French exam" to mark the sentence as part of those you will use to practice before your French exam, you should create a list for that. We know lists are not as practical as tags, but we'll be improving the lists feature as soon as we have time.


2. Avoid creating new tags

Avoid creating new tags because it can make the cleaning process harder. If the tag you want to add doesn't appear in the autocompletion list, then it's a new tag, so don't add it unless you are really convinced it's a valid tag.


3. Ask before you create a new tag

We don't have clear rules yet for what is a valid tag and what is not, but one of our moderators (Swift) volunteered to take care of the tags. If you feel the need to create a new tag, it would be wise to ask Swift first. He will be officially in charge of tidying up the tags. He will be the one deciding what tag to keep or not and what tag to rename. Also, don't hesitate to contact him if you would like to help out. It's not easy to decide on these things.


4. Use English for tags, unless you really can't

We have decided to use English as the default language for tags. We will rename all non-English tags into their English equivalent, when it is possible. We can still accept non-English tags, but only if there is no English equivalent.

The point of having one common language is uniformity. It would be inefficient to have a bunch of sentences tagged "proverb" (English) and another bunch tagged "proverbe" (French). There is also no point having a sentence tagged with both "proverb" and "proverbe". They are the same notion. It can even make things confusing to have several tags to designate a same notion, that's why we have decided to have one default language. We will later implement the possibility to translate the tags and to display them in languages other than English.


5. How things are going to work
  • We'll try to keep the process as transparent as possible.
  • Swift will publish on the Wall the modifications that will be applied to the tags (i.e. renaming and deletions).
  • There will be a few days until these modifications are actually applied, in case people strongly disagree with a decision.
  • Swift will also add on his profile and his personal web page the links to every Wall post mentioning the modifications, for people to be able to trace back all the decisions about the tags.
  • If you need to protest against a decision, please refer to Swift.

Sunday, September 26, 2010

Warning: you are being disrespectful

Translations of this article:


I decided to write more specific guidelines about how to react to bad behavior because I'm so fricken tired of seeing people attacking each other in public.

The community is growing and becoming more diverse. Diversity means divergence of opinions, which means more intense debates. I can accept divergence of opinions, it's normal, it's even necessary. But I cannot accept people flaming each other in public. I don't expect members to act all lovey-dovey with each other, but do I expect members make an effort to be respectful with each other, NO MATTER WHAT.


If you think a user is being disrespectful
  1. Send him a private message with the title "Warning: you are being disrespectful". I insist very much on PRIVATE MESSAGE. Everyone can send this warning, not just moderators.
  2. Add in your private message the link to the comment where the user was disrespectful. I insist again: PRIVATE MESSAGE.
  3. Quote the part of the comment that you felt was disrespectful.
  4. Try to explain why you felt it was disrespectful.
  5. Add a link to this blog post.
Just in case it was not clear, I will repeat the main idea: if you think a user is being disrespectful, send him a private message and ONLY a private message.


If you received warnings
  1. It's possible that it was a misunderstanding from the sender, you can simply explain him what you really meant. But if one person misunderstood, it's possible that other people will misunderstand you as well, so you should consider clarifying your comment for everyone.
  2. It's possible that you are really being disrespectful, in which case you should consider deleting your comment or apologizing for being disrespectful (or both).
NOTE: Moderators cannot delete other people's comments. Don't count on them to censor you.


What do I find disrespectful?
  • Insulting someone is disrespectful, obviously. I don't think I need to explain that one.
  • Being condescending is disrespectful. You should treat everyone's opinion equally. It shouldn't matter whether you're debating with a 6 year-old kid or a non native speaker. You are NOT entitled to trash someone's opinions just because you think you know better. If you know better, then educate people, don't trash them.
  • Lecturing someone publicly is disrespectful. You can tell someone how they should behave in PRIVATE, but not in public, never EVER. Even something small like "Dude, calm down" => PRIVATE MESSAGE.
  • Generally speaking, writing negative comments about someone is disrespectful. If you don't like something about someone, you let them know in PRIVATE and ONLY IN PRIVATE.
Just to be clear, I may myself show lack of respect in moments of weakness. Everyone may. You come back tired from a long day of work, someone offends you publicly, you can't resist the temptation to reply back publicly as well. It happens to everyone. But it is NOT acceptable, there is NO EXCUSE for that.


What happens to people who misbehave?

My thoughts here about bad behavior are still true today. People who misbehave will not be banned, suspended or anything. They will simply receive a lot of warnings and hopefully those warnings can slap some sense into them. I count on EVERYONE to send warnings to users who are crossing the line. It's not only my job, it's not only moderators' job, it's not only trusted users' job, it's EVERYONE'S JOB to make sure Tatoeba remains a place that people ENJOY going back to.

If your inbox starts being filled with warning messages, you really need to work on your behavior. I must remind you that this is a collaborative project, and collaborative means we are working WITH each other, NOT AGAINST. If you care about this project, then please, show more maturity. If you can't do that, then for Tatoeba's sake, take a break and come back when you grow up. Thank you.

Tuesday, August 3, 2010

Submission policy - What kind of content do we want?

This article explains what kind of content we accept in Tatoeba, what kind of content we delete and what kind of content we review. Note that this article is not final. You have the right to object to something or to ask for more clarifications.


What do we accept?

Tatoeba is about collecting sentences so we only want sentences. However, what exactly do we mean by "sentences"? What is a sentence and what is not? It's actually a difficult question... No one will doubt that "I am happy" is a sentence. But what about "On the left", is that a sentence? What about "Thank you", "Yes", or "Awesome"?

As far as I'm concerned, I think Tatoeba can handle a loose definition of "sentence". We don't strictly need to have an entity with at least a verb. To me, when spoken, everything is a sentence. When written, the main difference between a sentence and a non-sentence is punctuation. That's all. For the rest, as long as people can imagine context where the "sentence" can be expressed, then it's a sentence.
So yes, I'm roughly saying that you can take all the words in the dictionary, add punctuation and perhaps a capital letter, you'd turn it into a sentence. I don't encourage it because it's not useful (dictionaries do that already), but one-word sentences are still tolerated. I'll trust people's common sense for adding only one-word sentences that are significant (for instance, "Hello" is, "House" isn't).

In case you run across sentences that are not strictly speaking sentences, then tag them as "non-sentence", so that there is a way to quickly identify them. Inform the owner about this article if he's a new member, and let him know it's better to to have sentences with more context.
At any rate, don't bother starting endless discussions if the sentence has already been translated because it will be kept as is. Feel free however to add a new sentence based on the "non-sentence".

Generally speaking, Tatoeba is open to many kinds of sentences. We tolerate casual speech, slang, insults (as long as they are not targeting anyone in particular), erotic sentences, sentences that are not "true" (after all, Tatoeba is not an encyclopedia). These sentences can be tagged accordingly to inform users. But I'll ask people to focus primarily on appropriate and politically correct sentences. We don't have (yet) a good system to filter out sentences that are not very "safe", so don't flood us with those, please.


What do we delete?

What we delete for sure are:
  • Entries that people add by mistake due to our failure to provide a more efficient interface.
  • Sentences that owners themselves requested to delete (because the delete feature is still not available to everyone).
  • Entries that are copyrighted or under a license that is not compatible with CC-BY.
  • Racist comments and personal attacks, if they are really harmful and there is a general agreement that it should be removed.
  • Entries that really make no sense and whose owner won't provide any explanation.
In the perspective of providing better content, I'm also allowing the deletion of "sentences" that are "not really sentences" and came from the Tanaka Corpus, but only under these conditions:
  • The vocabulary is already illustrated in other sentences.
  • There is only the Japanese-English pair, no translation into any other language. We can make an exception for French (i.e. it's still deletable if there is a French translation).
  • All the sentences that will be deleted do NOT belong to anyone.
It may be obvious, but you should avoid translating a sentence that is likely to be deleted... Unless you want to stand against its deletion.


What do we review?

By "reviewing" I mean correcting mistakes. So we correct spelling mistakes, grammar mistakes, bad formulations, etc. We want Tatoeba's data to be used (or at least usable) for educational purpose so we want good quality sentences.

However, the limit between a "correct" and "incorrect" sentence is not always clear and some sentences can generate a lot of debate. In such cases, the final decision belongs to the owner of the sentence.

Remember that Tatoeba allows several translations in a same language, so there is no point fighting endlessly on what is correct or not. Simply add another version of the sentence if you are not happy with the existing one, we don't mind at all having near duplicate sentences (cf. this discussion on the Wall, and more precisely my thoughts on the issue here).

We also don't want any kind of annotations in the sentences. You can find more details in the contributor's guide, rule #9. If you have a good reason to keep your annotations, then please explain it in your comments. Otherwise moderators have the right to edit your sentence two weeks after you have been requested to change your sentence.


What do we link?

Tatoeba's sentences are represented as a graph. Two sentences that are linked together have the same meaning. Linking two sentences in the same language is accepted, but you shouldn't link only based on meaning. The sentences that you link should also have an equivalent "style" and type of speech. Cf. my wall post here.

NOTE: Only trusted users can link sentences.

Monday, May 24, 2010

Moderators in Tatoeba

Translations of this article:



This is a little guide/FAQ to explain what is the role of a moderator in Tatoeba, and to make sure moderators use their powers wisely.


Why do we need moderators?

Every community needs their moderators, but in Tatoeba more specifically, the problem is that unless you are the admin, you (currently) cannot :
  • delete sentences, not even your own sentences
  • edit sentences that do not belong to you
So with the growing community, more and more sentences are getting in the "delete me" and "correct me" queues (due to members who never come back to correct their sentences).

Moderators are here to help take care of these sentences that no one else can take care of.


What can moderators do?

Moderator can currently delete, edit, link/unlink any sentence. Yes, this is a lot of power, but since contributions are logged and can be seen by everyone, we don't need to worry too much about a moderator going nuts and ruining others' work.

Keep in mind that the moderator's rights are not "stable" yet. We will balance out the permissions over time. For now, we don't really have time, so we'll trust moderators for doing the right things.


When should moderators edit or delete?

Only use your moderator rights as the last resort.

This is especially true when dealing with others' sentences. Some people will gladly let you edit or delete their sentences without having to be notified about it (they may even be annoyed by this). But other people may feel that you are abusing of your powers, not respecting their work, not acknowledging their presence in the project, or whatsoever.

To avoid any kind of conflict, only edit sentences where the latest correction request says "two weeks ago" (or more) and no correction has been made. Only delete a sentence after asking the owner if they're okay with their contribution being deleted.

Basically, give people the time to do their work first, and only if they don't do anything, you can step in.


How do you become a moderator?

You can either ask Trang or wait for her to notice that you are a good candidate to be a moderator. The criteria is that you are at least already a "trusted user". The rest is subjective.

Tuesday, February 23, 2010

How to be a good contributor in Tatoeba

This article was written back in 2010 and may contain some outdated information. A more up-to-date version is maintained on Tatoeba's wiki:
https://en.wiki.tatoeba.org/articles/show/how-to-be-a-good-contributor-in-tatoeba




Translations of this article:





Introduction

This article is a must-read for anyone who is serious in about contributing in Tatoeba. It is quite long, so here is a summary of how to be a good contributor:
  1. Understand the context of the project
  2. Understand how the corpus is structured
  3. Do not pay attention to the other translations
  4. Do not translate word for word
  5. Do not edit a sentence if, by itself, it is correct
  6. Do not change the language of a sentence
  7. Make sure you are adding comments to the right sentence
  8. Do not add sentences from copyrighted content
  9. Do not annotate sentences
  10. Give us feedback
  11. Do not wait for us to code it if you can code it
  12. Indicate your languages in your profile
  13. Encourage and educate new (or even not so new) contributors
  14. Spread the love


    1. Understand the context of the project

    I will (someday) write a more detailed (his)story, but here are the basic facts you should be aware of.
    • I started this project in 2006. The initative was driven by a passion for language learning and the frustration of not finding an adequate online dictionary.
    • The project is focused on sentences and I insist on sentences. The reason is that I felt example sentences was (and still is) a very scarce resource. Please only add complete sentences if you are going to contribute.
    • I was actually "alone" on this project for some time. It was only three years later, in 2009, that other people (all computer science students) started to help me out coding more features.
    • Tatoeba is NOT a commercial project. We're not a company, we're not paid for doing any of this. It is is something that we're working on in our free time.
    • To be honest, we don't exclude the possibility of starting a company someday, but that is if and only if we have an innovative, coherent and ethical business model (yea, good luck). Things like having ads everywhere and drive a lot of traffic, or forcing people to pay to access the data is out of the question.


    2. Understand how the corpus is structured

    This is the tricky part, and hopefully I can explain it clearly enough for everyone.

    The corpus is not structured as a table but as a graph. What does it mean? Well, imagine you had to extract part of the corpus and write it on paper. What you would certainly do is something like this:

    English French Spanish
    My name is Trang. Je m'appelle Trang. Me llamo Trang.
    How are you? Comment vas-tu? ¿Cómo estás?
    ... ... ...

    That's a table structure. There are rows and columns: a same row contains sentences with the same meaning, a same column contains sentences with the same language. That's the first approach anyone would have, but that's NOT how the corpus is structured.

    This is how the corpus is structured:



    That's a graph structure. There are nodes and edges: each node represents a sentence, and each edge represent the link between two sentences. When two sentences are linked, they have the same meaning.

    The way you will contribute would be very different from a structure to another. One important implication is that you can add multiple translation in a same language for a specific sentence. You think there are two ways to translate a sentence and you really can't decide which would be the best? Well, just add both!

    Some other implications are pointed out below.


    3. Do not pay attention to the other translations

    When you translate a sentence, you are in fact adding a sentence (a node) and adding a link (an edge) between the "original" sentence and your translation. So the only thing you need to care about is that you are adding a proper translation to "main sentence" (the one at the top, written in bigger size).

    More concretely, if you were in this situation and wanted to add a Spanish translation to the English sentence:

    How are you?
    => Comment vas-tu?

    You could add "¿Cómo estás?" (casual) as much as you could add "¿Cómo está usted?" (formal). Or you could add both (because you can add multiple translations in a same language).
    If you understand French, it doesn't matter if the French sentence is the casual form, you only have to worry about the fact that your translation is a proper translation of the English sentence. A proper translation means that if someone had to translate your contribution back to English, "How are you?" would be a possibility.


    4. Do not translate word for word

    We are not interested in having sentences that sound like they were written by a robot. We want sentences that really are what a native speaker would say. Translating is a very difficult task, we know it. But if you are translating into your native language, you should always, always re-read your translation as if it was a single sentence, and ask yourself if it is actually something people would say. You can use the comments to indicate a literal translation.

    If you are not translating into your native language (which you can), you are forgiven for not writing native-like sentences. But in this case, please make sure you find a native speaker to check your sentences so that your possible mistakes get corrected more quickly.

    The point is to understand that Tatoeba is not only about providing translations, it's also about gathering data about a language. Tatoeba could simply be limited to adding sentences without translating them at all. If we were to extract only the sentences in Italian, we would like that each of them are representative of the Italian language.

    The sentences are the basic layer. The links between the sentences is another layer. But the corpus should make sense without those links.


    5. Do not edit a sentence if, by itself, it is correct

    As I mentionned just above, Tatoeba could simply be limited to adding sentences without translating them at all. Consequently, before you modify a sentence, look at it without paying attention to its translations, and ask yourself "Does this sentence have any spelling or grammar mistake? Does it sound weird?". If the answer is "No", then do NOT edit it, leave it alone!

    I am explaining this because you may be tempted to edit a sentence so that its meaning matches all the other sentences.

    It could be because you want to turn a sentence into a more "literal" translation. But this is not a good idea. Obviously, if we don't want you to translate word for word (cf. rule #4), we also don't want you to change a sentence into a word for word translation.

    It could also be because the sentence doesn't match AT ALL. For instance:

    My name is Trang.
    => Je m'appelle Trang.
    => Vamos a la playa.

    You notice that the Spanish sentence (which says "Let's go to the beach") has nothing to do with the English sentence.

    Perhaps you don't speak Spanish very well so you're not confident in modifying the Spanish sentence and decide to change the English sentence. Problem: what about the French sentence? It won't fit the English sentence anymore...

    Perhaps you are a native Spanish speaker and decide to change the Spanish sentence. In this particular case, it would still be acceptable because the Spanish sentence is not linked to any other sentence. But if someone had translated that Spanish sentence into Italian, "correcting" the Spanish sentence would cause a conflict with the Italian translation.

    Then there is a problem you may have not thought of: when changing the meaning of a sentence, you are potentially erasing unique vocabulary. What if the Spanish sentence was currently the only one with "playa" in it?

    So the best way to proceed in this kind of situation is to add a new Spanish translation (Me llamo Trang) and "unlink" the current Spanish translation. NOTE: Not everyone can unlink. Only "trusted users" can. You can post a comment to request a sentence to be unlinked.


    6. Do not change the language of a sentence

    If the language flag of a sentence is wrong (for instance it was flagged as Chinese when it is in fact Japanese), then of course, you can change it. That's not what I mean by "Do not change the language".
    What I mean is that you shouldn't replacing a Japanese sentence by a Chinese sentence with the same meaning (and that applies to any language of course). It shouldn't often happen, but if you're in a situation where you want to do that, then don't.

    The problem is that a sentence can be associated to data that is dependant on its language. For instance comments. People can post comments on sentences, and the comments may be valid only because the sentence was in a certain language.

    At the moment it is more an issue for Japanese sentences, which are associated to some sort of annotations. These annotations are not displayed because they are not useful for normal users. If you change a Japanese sentence into an English sentence, then the annotations that were associated to it won't make sense anymore.


    7. Make sure you are adding comments to the right sentence

    When you post a comment, the comment is only associated to the main sentence, so make sure that your comment is related to that particular sentence. Typically, if you want to point out a spelling mistake, like here:

    My name is Trang.
    => Je m'appel Trang.
    => Me llamo Trang.

    You can see that the French sentence is wrong. It should be "appelle" and not "appel". If you post your comment here, it would be associated to the English sentence (because it's at the top, so it's the main sentence). This is not what you want. The right thing to do is to click on the French sentence first. It will change the configuration into:

    Je m'appel Trang.
    => My name is Trang.
    => Me llamo Trang.

    And then you can post your comment.

    Now there is the case where you want to point out that a translation is wrong. Your comment will be related to two sentences, so where should you post it? Well, ideally, for this type of situation, there should be the possibility to comment a link between two sentences. But we don't have that, we can only comment a sentence. So you are free to decide where you want to post your comment. Just remember that it's good as long as your comment is related to the main sentence.


    8. Do not add sentences from copyrighted content

    We are distributing the corpus under the Creative Commons Attribution (or CC-BY) license. It makes it possible for anyone to re-use this data in any way they want as long as they mention Tatoeba in their work.

    As a contributor, you have agreed with the terms of use (which of course you haven't read), and therefore you are providing your contributions under the CC-BY license as well. Which means we can re-use your data in any way we want as long as we mention you. So we are re-using your work in Tatoeba, and we mention you through the logs and the stats.

    But providing your work under CC-BY means you also have some responsibilities on what you provide. And you have to know that you cannot legally redistribute data if it was copied from a source that doesn't clearly state that you can do it. Typically, you cannot (legally) copy all the sentences from a textbook and add them into in Tatoeba.

    Don't worry, you (and we) won't get in jail and be in debt for life if you've added a couple of sentences from a textbook (hopefully...). But the law forbids us to take the work of someone and re-use it without their consent. Producing sentences and translations is work, so be careful where you get the sentences from. Preferably, come up with your own sentences or take them from books that are in the public domain.

    If you have added or have seen sentences that were copied from a copyrighted material, change a few words so that it won't be exactly the same sentence. Or, go negotiate with the authors and convince them to release their work under the CC-BY license so we can re-use it.

    I'm not going to argue on whether all of this makes sense or not (obviously I don't believe it does), but it will help us a lot if everyone did the necessary so we don't get sued.


    9. Do not annotate sentences

    We want sentences to remain as "raw" as possible so do not add annotations. For example we do NOT want sentences like this:
    1. I (female) am happy.
    2. It's raining cats and dogs. (idiom)
    3. I like her/him.
    Regarding sentences 1 and 2, if you need to indicate that a sentence is a proverb or female speech or whatsoever, then post a comment about it (or tag it, if you are a trusted user), but please do NOT add this information directly in the sentence.

    Regarding sentence 3, instead of having only one sentence, split it into two sentences. Remember, you have the right to add multiple translations in a same language. So it's okay to have this:
    Je l'aime bien.
    => I like her.
    => I like him.

    There are various reasons why we don't want annotations.
    1. They can be a problem for people who are using our data in order to improve a natural language processing system, for instance.
    2. Your translation can be retranslated into another language, and it's less easy for people to translate sentences that contain alternatives (like "him/her").
    3. If we want to record audio for the sentence, we will need to choose what exactly to record, and annotations don't help.


    10. Give us feedback

    We know that Tatoeba is not perfect so don't hesitate to tell us what you think is missing (just make sure no one has talked about it on the Wall already). Also tell us if you see any spelling mistake, feel that some explanations are not clear, or encounter bugs.

    We also know that Tatoeba is a cool project so feel free to tell us you like it too :P


    11. Do not wait for us to code it if you can code it

    As much as we welcome feedback, we welcome even more INITIATIVE. There are just sooo many things we could do. We can't take care of everything.

    For instance we are distributing the entire corpus, but many people probably don't need all the sentences in all the languages. You may just want the English-Spanish sentences. Well instead of asking and waiting for us to provide a file with only English-Spanish sentences, you can code a tool (and please, tell us if you do) that will extract only what you want from the our files.

    That's just one example but if you are a programmer, there could be many things you could do yourself instead of waiting for us to do it. But of course, tell us so we don't start working on something you plan to work on.

    You also have to know that we are actually open source (under AGPL license) but we are not really "promoting" this aspect because:
    1. The code hasn't met my standards of elegance yet... Still too many parts that make me cringe when I look at them.
    2. We still don't have a sound methodology and organization in our way of working and I really don't have time to manage more people.
    However if you love the project and are really motivated to join the development team, then feel free to contact us =)


    12. Indicate your languages in your profile

    For people who didn't know, you can edit your profile by clicking on your username (at the top, in the menu bar).

    Since Tatoeba involves languages, it can be very useful for other users to know which languages you can speak and how well you can speak them. We don't have a specific "languages" field so you will have to write about it in your profile description (in the section "Something about you").

    And tell other users to indicate their languages as well (if they haven't already), especially if they have contributed.


    13. Encourage and educate new (or even not so new) contributors

    The community is very important in a project like Tatoeba, we just can't achieve the ambition without a strong community. But how do you build a strong community? Well, one thing is NOT to make new users feel lost and isolated.

    Part of this depends on the system. It has to be designed in a way that not only enables but also encourages users to interact with each other. Tatoeba is not great at that, but you have the minimum (private messages, wall, comments).

    And the other part depends of course on the community itself. There must be an effort from the community to build a stronger community. So if someone is asking a question to which you can answer, don't hesitate to help out. If you notice someone is going something wrong, don't hesitate to tell them the right way to do it. If you notice someone or some people have been contributing significantly, don't hesitate to drop a line (in a private message or on the Wall) to say "congratulations" or "thank you" for their work.

    More generally speaking, if you have any idea on how to make Tatoeba a more socially pleasant place to be, then go ahead!


    14. Spread the love

    Last but not least: you love the project, we love the project, we all want this project to become the greatest language tool of all time, so bring more people into this adventure!

    In the end, anyone who knows how to read and how to write can participate. There's no need to be a polyglot. If you can "just" hunt for mistakes and correct them or point them out, it will be already extremely helpful. The more people, the more mistakes we can take down, the more data we can produce that people can rely on. And everyone can live happily ever after.