HG2002: Semantics and Pragmatics
Assignment 1: Lexical Semantic Analysis with a Semantic Network
This is an group assignment for HG2002
consisting of three parts:
- Annotate (on your own)
All open class words in a short section of text (by 23:59 Oct 15th).
In this phase please do not discuss your annotations with each other.
- Compare your annotations with one other annotator and a machine's results
and make any changes you consider necessary (you should discuss
with your partner). Partners are randomly assigned.
- Write a group paper describing your results
(The deadline is on the main page, the length is given below)
It is worth 30% of your total mark. You will be marked on the
accuracy of your annotation and the quality of your write-up.
Semantic Analysis Phase 1: Annotation
- If you have not done so already, read the story (at least up to
the part you are assigned, preferably the whole story).
This year we will look at
the The Adventure of the Naval Treaty
- Using the on-line tool provided annotate each open class word in
a short (roughly 30 sentence) text.
- For each word you should chose one of the senses
in wordnet
(Fellbaum 1998) or indicate why this is not possible.
- If you think it is marked for sentiment, you should note its
polarity and strength
- There is a quick start guide.
- More detailed instructions are
given on-line,
read them.
- The tool works best under Chrome or Firefox. If one does not
work, please try the other.
- ⚠We are having a problem with cookies, if you cannot log
in, try logging in in private/incognito mode (2021)
- ⚠ NTU has not updated the digital certificate, so please
click through (it is safe, I promise) (2021)
- ⚠ NTU is not doing the DNS lookup, so I have replaced with
the IP address (2021)
- Your assigned sentences are here
- Estimated time 4-6 hours.
- You should have annotated every word by the deadline. It
is important you make this deadline so that we can merge the
data for you.
Semantic Analysis Phase 2: Comparison
- You will be given a comparison of your tags with those of other
annotators and a merged corpus with the majority tags tagged.
- You should re-tag any words on which all three of you
disagreed, or on which you changed your mind.
- Note that the final annotator is a naive computer
— it just tags most frequent sense (mfs)
- mfs is calculated from the semcor corpus and three Sherlock
Holmes Stories (DANC, SPEC and REDH) weighted 1:3 to normalize frequencies
- Unseen proper nouns are tagged as per
- Unseen closed class words are tagged as x
- Unseen monosemous words are tagged with their single sense
- Other unseen words are tagged None
- If there are two or more equally frequent senses for a lemma
then it is tagged None
- So feel free to over-ride them!
Phase 3: Write up
- In the write up you should describe the strengths and weaknesses
of using a lexical resource such as wordnet to define word meaning
- Are the senses in wordnet too coarse, too fine or just right?
Justify your position.
- You should give concrete examples from the text you analyzed.
Some things you could discuss include:
- Were some words easier or harder to annotate than others?
- e.g. verbs, multiword expressions, concrete nouns, …
- In cases where you disagreed with other annotators, on
reflection, do you think: you were right; they were right; the
definition is bad; or is there some other reason?
- For words with senses missing in wordnet, you should write a
comment with enough information to create a new entry for them
consisting of, at minimum, a definition, a relational link to an
existing synset and an example. E.g.
- Lemma: arrow
- Def: To assign a task to someone. Generally used only if the task is unpleasant or boring.
- Ex: They come and arrow me type their document
- Hyponym of: delegate (02391803-v)
Don't actually create a new synset (with the edit or add
a new synset button), just write in the comments.
- You don't need an extensive literature review, but you should
read and cite the references below and if you
consult other lexicons (which you are encouraged to do) then you
should cite them. You should also cite the wordnet you used and
the stroy you tagged.
- You should also discuss how long it took you to do the
annotation, and if you think there would be ways to make the task
quicker or easier.
- Formatted according to
the LMS guidelines to submitting written work for the Division of LMS
(but see below).
- You do not have to follow the suggested structure of "Introduction, Literature Review, Methodology, Results, Discussion, Conclusion, References." A short introduction describing the task followed by Results, Discussion, Conclusion, References is enough.
- You should mention which corpus and which section you were annotating (e.g
eng: sentences XXX to YYY)
- You should use single spacing, not double spacing.
- If you want to make it even more beautiful, as I am sure you do,
take a look at my (Computational)
Linguistic Style Guidelines: a guide for the flummoxed.
- Submit
softcopy (via NTULearn). Only one person from each group
needs to submit.
- The paper should be six to eight pages, excluding references.
You should not include any appendices: everything should fit within
the paper.
- The deadline is on the main page..
Rubric
You will be marked 50% on the annotation: (i) completeness (did you annotate every word), (ii) accuracy (did you select an appropriate meaning) and (iii) recall (did you only annotate words for sentiment that are not neutral) and (iv) quality of explanation (did you provide an informative explanation for every word you tagged as ‘e’ or ‘w’). The remaining 50% will be on the write up. Criteria to grade the write up include 1) language and style of the paper, 2) quality of selection of examples, 3) quality of the analysis of the data, 4) quality of discussion and conclusions, 5) overall organization and unity.
References
- Francis Bond, Andrew Devadason, Melissa Rui Lin Teo and Luís
Morgado da Costa (2021)
Teaching Through
Tagging — Interactive Lexical Semantics In Proceedings of the 11th
Global Wordnet Conference (GWC 2021)
- Bond, Francis, Luís Morgado da Costa, and Tuấn Anh Lê (2015)
IMI — A Multilingual Semantic Annotation Environment.
In Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing. pp 7–12
- Christine Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press.
- Shari Landes, Claudia Leacock, and Christiane
Fellbaum. 1998. Building semantic
concordances. In Fellbaum (1998), chapter 8, pages 199–216.
- H. Langone and B. R. Haskell and G. A. Miller
2004 Annotating
WordNet. In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004.
- Shan Wang and Francis Bond (2014)
Building The Sense-Tagged Multilingual Parallel Corpus In 9th Edition of the Language
Resources and Evaluation Conference (LREC 2014),
Reykjavik.
- Liling Tan and Francis Bond. (2011)
Building
and annotating the linguistically diverse NTU-MC
(NTU-multilingual corpus)
In Proceedings of the 25th Pacific Asia Conference
on Language, Information and Computation (PACLIC 25)
pp 367–376. Singapore
Francis Bond
<bond@ieee.org>
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303