This article explains what kind of content we accept in Tatoeba, what kind of content we delete and what kind of content we review. Note that this article is not final. You have the right to object to something or to ask for more clarifications.
What do we accept?
Tatoeba is about collecting sentences so we only want sentences. However, what exactly do we mean by "sentences"? What is a sentence and what is not? It's actually a difficult question... No one will doubt that "I am happy" is a sentence. But what about "On the left", is that a sentence? What about "Thank you", "Yes", or "Awesome"?
As far as I'm concerned, I think Tatoeba can handle a loose definition of "sentence". We don't strictly need to have an entity with at least a verb. To me, when spoken, everything is a sentence. When written, the main difference between a sentence and a non-sentence is punctuation. That's all. For the rest, as long as people can imagine context where the "sentence" can be expressed, then it's a sentence.
So yes, I'm roughly saying that you can take all the words in the dictionary, add punctuation and perhaps a capital letter, you'd turn it into a sentence. I don't encourage it because it's not useful (dictionaries do that already), but one-word sentences are still tolerated. I'll trust people's common sense for adding only one-word sentences that are significant (for instance, "Hello" is, "House" isn't).
In case you run across sentences that are not strictly speaking sentences, then tag them as "non-sentence", so that there is a way to quickly identify them. Inform the owner about this article if he's a new member, and let him know it's better to to have sentences with more context.
At any rate, don't bother starting endless discussions if the sentence has already been translated because it will be kept as is. Feel free however to add a new sentence based on the "non-sentence".
Generally speaking, Tatoeba is open to many kinds of sentences. We tolerate casual speech, slang, insults (as long as they are not targeting anyone in particular), erotic sentences, sentences that are not "true" (after all, Tatoeba is not an encyclopedia). These sentences can be tagged accordingly to inform users. But I'll ask people to focus primarily on appropriate and politically correct sentences. We don't have (yet) a good system to filter out sentences that are not very "safe", so don't flood us with those, please.
What do we delete?
What we delete for sure are:
- Entries that people add by mistake due to our failure to provide a more efficient interface.
- Sentences that owners themselves requested to delete (because the delete feature is still not available to everyone).
- Entries that are copyrighted or under a license that is not compatible with CC-BY.
- Racist comments and personal attacks, if they are really harmful and there is a general agreement that it should be removed.
- Entries that really make no sense and whose owner won't provide any explanation.
In the perspective of providing better content, I'm also allowing the deletion of "sentences" that are "not really sentences" and came from the Tanaka Corpus, but only under these conditions:
- The vocabulary is already illustrated in other sentences.
- There is only the Japanese-English pair, no translation into any other language. We can make an exception for French (i.e. it's still deletable if there is a French translation).
- All the sentences that will be deleted do NOT belong to anyone.
It may be obvious, but you should avoid translating a sentence that is likely to be deleted... Unless you want to stand against its deletion.
What do we review?
By "reviewing" I mean correcting mistakes. So we correct spelling mistakes, grammar mistakes, bad formulations, etc. We want Tatoeba's data to be used (or at least usable) for educational purpose so we want good quality sentences.
However, the limit between a "correct" and "incorrect" sentence is not always clear and some sentences can generate a lot of debate. In such cases, the final decision belongs to the owner of the sentence.
Remember that Tatoeba allows several translations in a same language, so there is no point fighting endlessly on what is correct or not. Simply add another version of the sentence if you are not happy with the existing one, we don't mind at all having near duplicate sentences (cf.
this discussion on the Wall, and more precisely my thoughts on the issue
here).
We also don't want any kind of annotations in the sentences. You can find more details in the contributor's guide,
rule #9. If you have a good reason to keep your annotations, then please explain it in your comments. Otherwise moderators
have the right to edit your sentence two weeks after you have been requested to change your sentence.
What do we link?
Tatoeba's sentences are represented as a graph. Two sentences that are linked together have the
same meaning. Linking two sentences in the same language is accepted, but you shouldn't link only based on meaning. The sentences that you link should also have an equivalent "style" and type of speech. Cf. my wall post
here.