10. Lexical Resources and WordNet.

Lecture notes

Further reading

Before class (code, output)

  1. A Pronouncing Dictionary
    Most words in the dictionary have the same first phonetic code as their first letter:
    E.g.: for ('fir', ['F', 'ER1']) 'f' is the same as 'F'
    Sometimes they do not
    E.g.: for ('yves', ['IY1', 'V'])
    What proportion of the words start with the same code?
    What are some common mismatches?
  2. Start a WordNet browser by doing any one of the following:
    1. Use UPOL's online Open Multilingual Wordnet v1.3
    2. Use UPOL's online Open Multilingual Wordnet v2.0
    3. Use Princeton's online WordNet 3.1 Search
    4. Use the Open English Wordnet
    Whatever you use to access WordNet, try the following:
    • Nouns: Find hyponyms of "student". What kind of student are you?
      Compare hypernyms of "student" and "professor". What's the difference?
      Compare hyponyms of "professor" and "lecturer". Is WordNet US English?
    • Adjectives: Compare "big", "large", "great". What are their antonyms?
    • Multiword Expressions (Collocations): An MWE like "big sister" has its own WordNet entry.
      Which combinations of "big/large/great sister/uncle/toe" are listed in WordNet?

Practical work -- in class (code, output)

  1. Load wordnet inside python.
    • Look at the different synsets for bird.
      How many are there?
      What are their definitions?
      How deep in the hierarchy are they?
    • For the first synset (omw-en-01503061-n) print the lemmas in the languages other than English
    • For each synset, print out each sense and its frequency (hint freqency of a lemma is given by sense.counts)
    • Give the total frequency for each synset
  2. Find all hyponyms (including hyponyms of hyponyms, ...) for the colour red!
    • Use closure (shown in the How To)
  3. β˜… Tabulate the average polysemy per word length for all words in one language in wordnet, and then seperately for each part of speech.
    • polysemy is number of synsets/word
    • you can get all words by [w for w in wn.words()]
    • for just nouns you can do: [w for w in wn.synsets(pos='n')]
    • for just English you can do: [w for w in wn.synsets(lang='en')]
    • if you do just wn then you get all words for all languages, this can be a lot!

β˜… these problems are only for over achievers :-)

Further work -- at home (code, output)

  1. Find all nouns in English that are both animals and food
    • Use the supersense (aka lexfile, topic)
    e.g. duck, salmon
  2. Try in another language
    • Not all wordnets have supersenses, so you may have to translate the synset to English, and look at the English supersense
    e.g. 颏, ε…Ž

LAC: Language and the Computer Francis Bond.