LAC: Language and the Computer: Project 2

This assignment constitutes 30% of your final grade for LAC. Please work on this individually or in groups of two or three.

Pick one of the following:

A: Check a basic list

  1. Go through the Tokyo University of Foreign Studies Basic Vocabulary for Indonesian and see how much is covered by the data you extracted in Project 1 for Indonesian
    • Show what % are in the data
    • Show what % have the same translation in wordnet
      You can get the translation from another wordnet, linked by the inter-lingual index (ili)

    e.g kopi is a concept in the Indonesian basic vocabulary (tufs-id-07929519-n) and is linked by ili="i31263" to coffee (tufs-en-07929519-n) in the English wordnet
    The Indonesian data you processed had:
    	kopi = coffee
    so the concept is: (i) in the data and (ii) has the same translation
  2. Give the results by part of speech
  3. Show the results graphically (you choose what kind of graph)

Stretch Goals

You should do at least one of these.

  1. Do some error analysis, for concepts that did not match, explain why
  2. Show for all the languages (preferably using a loop)
  3. List what basic concepts were not in the duolingo wiki data, and discuss why
  4. List what words were in the duolingo wiki data but not in the basic conepts, and discuss why

You should use these wordnets:

If you had trouble extracting data for P1, come and see me.

B: Own Task

You can suggest your own task, and if I say ok, do it instead. This can be used to fit in with other research you are doing, but should not duplicate work done for other assessment, you must do something new for this class.

Deliverable

You should deliver a paper, which contains a link to the program and if necessary the output files.

Project Two for LAC: Language and the Computer Francis Bond.