HG2002: Semantics and Pragmatics

Assignment 1: Lexical Semantic Analysis with a Semantic Network

This is an group assignment for HG2002 consisting of three parts:

Annotate (on your own) All open class words in a short section of text (by 23:59 Oct 15th). In this phase please do not discuss your annotations with each other.
Compare your annotations with one other annotator and a machine's results and make any changes you consider necessary (you should discuss with your partner). Partners are randomly assigned.
Write a group paper describing your results (The deadline is on the main page, the length is given below)

It is worth 30% of your total mark. You will be marked on the accuracy of your annotation and the quality of your write-up.

Semantic Analysis Phase 1: Annotation

If you have not done so already, read the story (at least up to the part you are assigned, preferably the whole story). This year we will look at the The Adventure of the Naval Treaty
Using the on-line tool provided annotate each open class word in a short (roughly 30 sentence) text.
- For each word you should chose one of the senses in wordnet (Fellbaum 1998) or indicate why this is not possible.
- If you think it is marked for sentiment, you should note its polarity and strength
- There is a quick start guide.
- More detailed instructions are given on-line , read them.
- The tool works best under Chrome or Firefox. If one does not work, please try the other.
  - ⚠We are having a problem with cookies, if you cannot log in, try logging in in private/incognito mode (2021)
  - ⚠ NTU has not updated the digital certificate, so please click through (it is safe, I promise) (2021)
  - ⚠ NTU is not doing the DNS lookup, so I have replaced with the IP address (2021)
- Your assigned sentences are here
Estimated time 4-6 hours.
You should have annotated every word by the deadline. It is important you make this deadline so that we can merge the data for you.

Semantic Analysis Phase 2: Comparison

You will be given a comparison of your tags with those of other annotators and a merged corpus with the majority tags tagged.
You should re-tag any words on which all three of you disagreed, or on which you changed your mind.
Note that the final annotator is a naive computer — it just tags most frequent sense (mfs)
- mfs is calculated from the semcor corpus and three Sherlock Holmes Stories (DANC, SPEC and REDH) weighted 1:3 to normalize frequencies
- Unseen proper nouns are tagged as per
- Unseen closed class words are tagged as x
- Unseen monosemous words are tagged with their single sense
- Other unseen words are tagged None
- If there are two or more equally frequent senses for a lemma then it is tagged None
So feel free to over-ride them!

Phase 3: Write up

In the write up you should describe the strengths and weaknesses of using a lexical resource such as wordnet to define word meaning
- Are the senses in wordnet too coarse, too fine or just right? Justify your position.
You should give concrete examples from the text you analyzed. Some things you could discuss include:
- Were some words easier or harder to annotate than others?
  - e.g. verbs, multiword expressions, concrete nouns, …
- In cases where you disagreed with other annotators, on reflection, do you think: you were right; they were right; the definition is bad; or is there some other reason?
For words with senses missing in wordnet, you should write a comment with enough information to create a new entry for them consisting of, at minimum, a definition, a relational link to an existing synset and an example. E.g.
- Lemma: arrow
- Def: To assign a task to someone. Generally used only if the task is unpleasant or boring.
- Ex: They come and arrow me type their document
- Hyponym of: delegate (02391803-v)
Don't actually create a new synset (with the edit or add a new synset button), just write in the comments.
You don't need an extensive literature review, but you should read and cite the references below and if you consult other lexicons (which you are encouraged to do) then you should cite them. You should also cite the wordnet you used and the stroy you tagged.
You should also discuss how long it took you to do the annotation, and if you think there would be ways to make the task quicker or easier.
Formatted according to the LMS guidelines to submitting written work for the Division of LMS (but see below).
- You do not have to follow the suggested structure of "Introduction, Literature Review, Methodology, Results, Discussion, Conclusion, References." A short introduction describing the task followed by Results, Discussion, Conclusion, References is enough.
- You should mention which corpus and which section you were annotating (e.g eng: sentences XXX to YYY)
- You should use single spacing, not double spacing.
If you want to make it even more beautiful, as I am sure you do, take a look at my (Computational) Linguistic Style Guidelines: a guide for the flummoxed.
Submit softcopy (via NTULearn). Only one person from each group needs to submit.
The paper should be six to eight pages, excluding references. You should not include any appendices: everything should fit within the paper.
The deadline is on the main page..

Rubric

You will be marked 50% on the annotation: (i) completeness (did you annotate every word), (ii) accuracy (did you select an appropriate meaning) and (iii) recall (did you only annotate words for sentiment that are not neutral) and (iv) quality of explanation (did you provide an informative explanation for every word you tagged as ‘e’ or ‘w’). The remaining 50% will be on the write up. Criteria to grade the write up include 1) language and style of the paper, 2) quality of selection of examples, 3) quality of the analysis of the data, 4) quality of discussion and conclusions, 5) overall organization and unity.

References

Francis Bond, Andrew Devadason, Melissa Rui Lin Teo and Luís Morgado da Costa (2021) Teaching Through Tagging — Interactive Lexical Semantics In Proceedings of the 11th Global Wordnet Conference (GWC 2021)
Bond, Francis, Luís Morgado da Costa, and Tuấn Anh Lê (2015) IMI — A Multilingual Semantic Annotation Environment. In Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing. pp 7–12
Christine Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press.
Shari Landes, Claudia Leacock, and Christiane Fellbaum. 1998. Building semantic concordances. In Fellbaum (1998), chapter 8, pages 199–216.
H. Langone and B. R. Haskell and G. A. Miller 2004 Annotating WordNet. In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004.
Shan Wang and Francis Bond (2014) Building The Sense-Tagged Multilingual Parallel Corpus In 9th Edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik.
Liling Tan and Francis Bond. (2011) Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus) In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pp 367–376. Singapore

Francis Bond <bond@ieee.org>
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303