Tuesday, January 25, 2011

Legally valid content

This article aims to give general instructions on how to contribute legally valid content in Tatoeba, to minimize the risk of Tatoeba being shut down for having illegal content (not saying it will be happening anytime soon, but better be safe).

If there is one thing you will need to remember, it is this: do not add non CC-BY sentences in Tatoeba.

Non CC-BY sentences

Perhaps "non CC-BY sentence" is a bit cryptic for some of you so let me clarify what it means. CC-BY is a short name for the Creative Commons Attribution license. Tatoeba redistributes all its sentences under this license. A non CC-BY sentence is simply a sentence that is not compatible with the CC-BY license.
  • Anything that is under copyright is NOT compatible with CC-BY (that includes quotes from books, movies, songs...).
  • Anything that is under a license that has a "share alike" condition is NOT compatible with CC-BY. CC-BY-SA is not compatible with CC-BY. That means you can't copy text from Wikipedia into Tatoeba. But CC-BY is compatible with CC-BY-SA, so you may insert sentences from Tatoeba in Wikipedia, or Wikiquote for instance.
  • Anything that is under a license that has a "no commercial use" condition is NOT compatible with CC-BY.
  • Anything that is not under any license is not NOT compatible with CC-BY. If there's no license, it means by default that the author doesn't authorize re-use.
  • Anything that basically doesn't say "You can do absolutely whatever you want with this" is NOT compatible with CC-BY.

CC-BY sentences

But now you may wonder, what IS compatible with the CC-BY license?
  • Anything that is under CC-BY is compatible with CC-BY. Sentences that you add in Tatoeba and that were created by yourself are under CC-BY, because you agreed with the Terms of Use.
  • Anything that is in the public domain is compatible with CC-BY. If the author of a book was dead 100 years ago, then you can pretty much safely consider that the book is the public domain.
  • Anything that basically says "You can do absolutely whatever you want with this" should be compatible with CC-BY.

The basic rules to contribute legal content

1) If you want to be sure that your sentences are legally valid, do NOT copy-paste from anywhere (especially NOT from textbooks, electronic dictionaries, or other language learning websites), only come up with your own sentences.

2) We delete non CC-BY sentences. Depending on the situation, we may either delete the sentence right away, or give the contributor a delay to defend their sentence.

3) Do NOT translate a sentence that you think is non CC-BY. Instead, post a comment to express your doubts about the legal status of the sentence. If you are a trusted user, add the tag "@possibly non CC-BY". If you see other people adding or translating non CC-BY content, tell them NOT to do that.

4) If you do copy-paste from somewhere else, indicate in the comments where you copy-paste from. Give all the information you can so that we can easily find out it is indeed CC-BY compatible.

5) We will block a user's possibility to contribute (add, translate, edit sentences) if they are not following these rules.

6) To be honest, it can happen that we delete sentences that are legally valid, because the limit between legal vs non-legal is not always clear. If you are a specialist about these legal issues, please help us define a clear method to determine whether a sentence is legally valid or not.

Related links

Here's a bunch links related to copyright and stuff. I'm just throwing them here for those who are interested in expanding their knowledge on the matter. Wikipedia obviously has a lot of information on the subject since they have to deal with the problem certainly more often than any other collaborative project out there.


  1. Is it Ok to add quotes of well-known people?

  2. I think short quotes do not violate any copyright law if there is proper attribution. Why don't you allow adding author's name to a sentence? That would be also helpful for works in Public Domain.

    1. I completely agree. A single sentence surely counts as fair use.

  3. In all honesty, I don't think using one sentence from a copyrighted work would be considered as infringing, especially if used in accordance with the Fair Use Doctrine. A paragraph, too is possible so long as proper attribution and source linking (where applicable) are provided. It appears to me that this site's project in general would be considered a fair use because it's being used for educational purposes. Two paragraphs or a page from a source text is stretching things a bit, maybe, but a sentence or two should not be considered a big issue.


