LAC: Language and the Computer: Project 2
This assignment constitutes 30% of your final grade
for LAC. Please work on this individually or
in groups of two or three.
Pick one of the following:
A: Check a basic list
- Go through the Tokyo University of Foreign Studies Basic
Vocabulary for Indonesian and see how much is covered by the data
you extracted in Project 1 for Indonesian
- Show what % are in the data
- Show what % have the same translation in wordnet
You can get the translation from another wordnet, linked
by the inter-lingual index (ili)
e.g kopi is a concept in
the Indonesian basic vocabulary (tufs-id-07929519-n) and is linked by
ili="i31263" to coffee (tufs-en-07929519-n) in the English wordnet
The Indonesian data you processed had:
kopi = coffee
so the concept is: (i) in the data and (ii) has the same translation
- Give the results by part of speech
- Show the results graphically (you choose what kind of graph)
Stretch Goals
You should do at least one of these.
- Do some error analysis, for concepts that did not match, explain why
- Show for all the languages (preferably using a loop)
- List what basic concepts were not in the duolingo wiki data, and discuss why
- List what words were in the duolingo wiki data but not in
the basic conepts, and discuss why
You should use these wordnets:
If you had trouble extracting data for P1, come and see me.
B: Own Task
You can suggest your own task, and if I say ok, do it instead.
This can be used to fit in with other research you are doing, but
should not duplicate work done for other assessment, you must do
something new for this class.
Deliverable
You should deliver a paper, which contains a link to the program
and if necessary the output files.
- The paper should be no more than ten pages
including diagrams, with up to two additional pages of references,
- You should give concrete examples from the resource(s) you
analyzed.
- Include quantitive results.
- Include representative examples.
- You don't need an extensive literature review, but if you
consult other lexicons (which you are encouraged to do) then you
should read the papers that describe them carefully, and cite them.
- Include a link to your entire program on google colab --- I should be able to run the entire page
In the main paper describe generally what the program does (no
need to go into extreme detail)
- If you want to make it even more beautiful, as I am sure you do,
take a look at my (Computational)
Linguistic Style Guidelines: a guide for the flummoxed.
- Submit
softcopy (via moodle).
- The deadline is on the main page..
Project Two for LAC: Language and the Computer Francis Bond.