10. Lexical Resources and WordNet.

NLTK Chapter 2: Accessing Text Corpora and Lexical Resources
- 2.4: Lexical Resources
- We will use a new wordnet module wn (Documentation; How To) --- Please use this not NLTK
- 2.5: WordNet (How To)

Load the Open Multilingual Wordnet inside python. This will be slow the first time!
- Look at the different synsets for bird.
  How many are there?
  What are their definitions?
  How deep in the hierarchy are they?
- For the first synset (omw-en-01503061-n) print the lemmas in the languages other than English
- For each synset, print out each sense and its frequency (hint freqency of a lemma is given by sense.counts)
- Give the total frequency for each synset
Load the TUFS Basic Vocabulary for ja https://bond-lab.github.io/Language-and-the-Computer/code/P3/tufs-ja.xml
- Print out all the adverbs (synset id and lemmas)
★ Tabulate the average polysemy per word length for all words in one language in wordnet, and then seperately for each part of speech.
- polysemy is number of synsets/word
- you can get all words by [w for w in wn.words()]
- for just nouns you can do: [w for w in wn.synsets(pos='n')]
- for just English you can do: [w for w in wn.synsets(lang='en')]
- if you do just wn then you get all words for all languages, this can be a lot!

★ these problems are only for over achievers :-)

Find all nouns in English that are both animals and food
- Use the supersense (aka lexfile, topic)
e.g. duck, salmon
★ Try in another language
- Not all wordnets have supersenses, so you may have to translate the synset to English, and look at the English supersense
e.g. 鶏, 兎