Chinese Annotation Tool

My annotator can no longer run on the server that was hosting it, so unfortunately it is currently unavailable. If someone has extra server space and power (the annotator can be computationally intensive) to set up a mirror for the annotator, please let me know. You can download the necessary files to run your own mirror site. Read the included README.txt file. You can also use a similar annotator for Big5-encoded text or Waiyu.org's GB and Big5 annotation server. Also see Rikai and Adso. To those people who have found the annotator useful, I am sorry that it is temporarily unusable.

Mirrors


This tool makes learning to read Chinese easier by automatically marking up the words in a simplified Chinese text with their pronunciations and dictionary definitions. You can type or paste in GB-encoded text or the address of a Chinese web page. You have several choices of how the text will be annotated:
  1. Segment Only: In this option, the program will add spaces between the words in the text. No other information is added.
  2. Add Dictionary Entries at status line: After segmenting the text, the program adds two kinds of ways of looking up the word. First, when the user puts the mouse over an underlined word, its pronounciation and definition will appear at the bottom of the browser, at the status line. If the user actually clicks on the underlined word, it will take them to the pronounciation and English definition as a footnote later on in the page.
  3. Add Dictionary Entries as footnotes: This option is for users with older browsers that do not understand JavaScript. Holding the mouse over a word will do nothing, but clicking it on will still take the user to the definition in the footnotes. The file size is also smaller than option 2.
  4. Convert to Pinyin: The program segments the text and uses that information to convert the Chinese characters to pinyin. Only the pinyin is shown in the results.
  5. Add pinyin next to characters: The pronounciation of each character is indicated by adding pinyin by its side. No other information is added.
  6. Add Pinyin above Characters: Add the pinyin above the character, in an annotation style called the ruby. Currently only works on Internet Explorer 5 (for other browsers the pinyin is placed next to the character).
  7. Add Pinyin/Defs in Margins: Type a list of words into "Words to Annotate" box, one word per line with no spaces. Annotator will add definitions of these words to the right of the paragraph the words occur in. If the words are not in the dictionary, you can also include the definition besides the word, using the format described below.

User can add or override definitions in the "Words to Annotate" section by using this format:

Chinese [pinyin] /English definition/

That is, the Chinese (no internal spaces), followed by one space (not a wide Chinese space), followed by the pinyin surrounded by square brackets (with a space between each pinyin syllable), followed by another space, followed the English definition/explanation surrounded by slashes (this is the CEDict format). One word or definition per line.

Users can use this to override the CEDict definitions in the other modes if the definitions or romanizations have mistakes.

When using "Add to Margins" the first time the word occurs it will be in bold. Its definition will appear more or less to its right. Right now it is set up to try match words to paragraphs, so be sure to have at least one blank line between paragraphs.

Click here for an already annotated sample text.

The tool currently only handles GB-encoded text. Dictionary definitions are drawn from Paul Denisowski's CEDICT Chinese-English dictionary. If you find a word that does not have a definition, consider contributing it to the CEDICT project. The segmentation algorithm is still under development. Just what constitutes a "proper" Chinese word is also a good research topic. I will include my own guidelines at a later date. You can download the segmenter code (in perl) and run it yourself. You can also look up entries in CEDICT from my Chinese-English dictionary page.

Please visit my guestbook to tell me you comments and suggestions for this tool. If you came to this page directly, you might also enjoy visiting some of my other On-line Chinese Tools.