10. Lexical Resources and WordNet.
Lecture notes
Further reading
-
A Pronouncing Dictionary
Most words in the dictionary have the same first phonetic code as their first letter:
E.g.: for ('fir', ['F', 'ER1']) 'f' is the same as 'F'
Sometimes they do not
E.g.: for ('yves', ['IY1', 'V'])
What proportion of the words start with the same code?
What are some common mismatches?
-
Start a WordNet browser by doing any one of the following:
- Use UPOL's online
Open Multilingual Wordnet v1.3
- Use UPOL's online
Open Multilingual Wordnet v2.0
- Use Princeton's online
WordNet 3.1 Search
- Use the Open English Wordnet
Whatever you use to access WordNet, try the following:
- Nouns:
Find hyponyms of "student". What kind of student are you?
Compare hypernyms of "student" and "professor".
What's the difference?
Compare hyponyms of "professor" and "lecturer".
Is WordNet US English?
- Adjectives:
Compare "big", "large", "great". What are their antonyms?
- Multiword Expressions (Collocations):
An MWE like "big sister" has its own WordNet entry.
Which combinations of "big/large/great sister/uncle/toe"
are listed in WordNet?
Practical work -- in class (code,
output)
- Load wordnet inside python.
- Look at the different synsets for bird.
How many are there?
What are their definitions?
How deep in the hierarchy are they?
- For the first synset (omw-en-01503061-n) print the lemmas in the
languages other than English
- For each synset, print out each sense and its frequency
(hint freqency of a lemma is given by sense.counts)
- Give the total frequency for each synset
- Find all hyponyms (including hyponyms of hyponyms, ...) for the colour red!
- Use
closure
(shown in the How To)
- β
Tabulate the average polysemy per word length for all words
in one language in wordnet, and then seperately for each part of speech.
- polysemy is number of synsets/word
- you can get all words by
[w for w in wn.words()]
- for just nouns you can do:
[w for w in wn.synsets(pos='n')]
- for just English you can do:
[w for w in wn.synsets(lang='en')]
- if you do just
wn
then you get all words for all languages, this can be a lot!
β
these problems are only for over achievers :-)
Further work -- at home (code,
output)
- Find all nouns in English that are both animals and food
- Use the supersense (aka lexfile, topic)
e.g. duck, salmon
- Try in another language
- Not all wordnets have supersenses, so you may have to
translate the synset to English, and look at the English
supersense
e.g. ιΆ, ε
LAC: Language and the Computer Francis Bond.