HG8011: Projects 1 and 2
- If you have not yet done so, read the whole story:
The Final Problem
(FINA)
(The Final Problem)
- Tag all words that need to be tagged in the sentences assigned to you.
You can find your name in the list below
- For each word you should chose one of the senses in wordnet
(Fellbaum 1998) or indicate why this is not possible.
- If you think it is marked for sentiment, you should note its
polarity and strength
- There is a quick start guide.
- More detailed instructions are given in
the Detailed Tagging Guidelines
- The tool works best under Chrome or Firefox.
If you log in but it does not remember you, please use
incognito mode! There is an issue with cookies that we
have not been able to fix.
- You will be assessed on
- Completion: is everything tagged
- Accuracy: are the tags reasonable (do they agree more
than a threshold with an expert annotation)
- Your assigned sentences are here
- If you have tagged everything reasonably accurately then you will pass
- Estimated time 3-5 hours (but it may take longer). Don't overthink things.
- You should have annotated every word by the due date. It is
important you make this deadline so that we can merge the data
for part 2.
- Comment words where you are not sure of the tag
- The due date is on the main page
- After the task is completed, you will be given a comparison of
your tags with those of other annotators and a merged corpus with
the majority tags tagged.
- You should discuss any differences as a group
(see your group).
- You should re-tag any words on which all three of you
disagreed, or on which you changed your mind.
- You should add comments for
- Anything marked 'e', with suggestions as to how to fix the corpus
- Anything marked 'w', with suggestions as to how to fix
wordnet
- If the same word appears many times with the same issue,
you only need to comment on the first occurence
- Note that the final annotator is a naive computer
— it just tags most frequent sense (mfs)
- mfs is calculated from the semcor corpus and three Sherlock
Holmes Stories (DANC, SPEC and REDH) weighted 1:3 to normalize frequencies
- Unseen proper nouns are tagged as per
- Unseen closed class words are tagged as x
- Unseen monosemous words are tagged with their single sense
- Other unseen words are tagged None
- If there are two or more equally frequent senses for a lemma
then it is tagged None
- So feel free to over-ride them!
- You will be assessed (as a group) on
- Completion: is everything tagged
- Accuracy: are the tags reasonable (do they agree more than a threshold with an expert annotation)
- How insightful your comments are
- How reasonable the sentiment marking is
- The due date is on the main page
References:
Canonical Citation:
Liling Tan and Francis Bond. 2012. Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus). In International Journal of Asian Language Processing 22(4) pp 161–174.
Other References:
Francis Bond, Shan Wang, Eshley Huini Gao, Hazel Shuwen Mok, and Jeanette Yiwen Tan. 2013. Developing parallel sense-tagged corpora with wordnets. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW 2013). Sofia. pp 149–158.
Yu Jie Seah and Francis Bond. 2014. Annotation of Pronouns in a Multilingual Corpus of Mandarin Chinese, English and Japanese. In 10th Joint ACL - ISO Workshop on Interoperable Semantic Annotation Reykjavik.
Slav Petrov, Dipanjan Das, and Ryan McDonald. 2011. A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.
Shan Wang and Francis Bond. 2014. Building The Sense-Tagged Multilingual Parallel Corpus. In 9th Edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik.
Contributors: Francis Bond, Luís Morgado da Costa, Tuan Anh Le.
Francis Bond
<bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303