LAC: Language and the Computer: Project 2
This assignment constitutes 30% of your final grade
for LAC. Please work on this individually or
in groups of two or three.
Pick one of the following:
A: Illustrate a basic list
- Go through the basic list, and show a picture for each word
- Easiest to do by concepts
- Also show synonyms and examples, if any
- You can get a definition from another wordnet (like en-omw:1.4)
- Try to break things into smaller groups (pos, semantic field, first letter)
- Note how many words are missing pictures, and list them
- If the word has no picture, experiment with a hypernym or hyponym
- take a sample and evaluate --- how appropriate is the picture
- you can use ome-wn:1.4 to find semantic relations
- You can show images in google colab like this:
from IPython.display import display, Markdown
ara_id=2910
display(Markdown((f"""**Computer**
<img width='48' src='https://static.arasaac.org/pictograms/{ara_id}/{ara_id}_300.png'> *from Arasaac*""")))
- Convert the illustrated vocabulary to a pdf booklet (using for example https://pypi.org/project/markdown-pdf/)
You can use these wordnets:
- Tokyo University of Foreign Studies Basic Vocabulary for English
- Arasaac for English
- Then test with similar wordnets, you can chose the languages
- TUFS: ['as', 'de', 'en', 'es', 'fr', 'id', 'ja', 'km', 'ko', 'lo', 'mn', 'my', 'pt', 'ru', 'th', 'tl', 'tr', 'ur', 'vi', 'zsm']
- ARASAAC: ['an', 'ar', 'bg', 'ca', 'cs', 'de', 'en', 'es', 'et', 'eu', 'fa', 'fr', 'gl', 'he', 'hr', 'hu', 'it', 'ko', 'lt', 'mk', 'nb', 'nl', 'pl', 'pt', 'ro', 'ru', 'sk', 'sr', 'zh']
- Core synsets are a semi-automatically compiled list of 5000
"core" word senses in Princeton WordNet (approximately the 5000 most
frequently used word senses). We have mapped them to ilis
- Basic synsets are a compiled list of 578 "basic" synsets from
the Tokyo University of Foreign Studies Basic Vocabulary for Japanese.
We have mapped them to ilis
B: Own Task
You can suggest your own task, and if I say ok, do it instead.
This can be used to fit in with other research you are doing, but
should not duplicate work done for other assessment, you must do
something new for this class.
Deliverable
You should deliver a paper, which contains a link to the program
and if necessary the output files. For task A: Illustrate a basic list attach the illustrated list as well, either as a separate file or as an appendix.
- The paper should be no more than ten pages
including diagrams, with up to two additional pages of references,
- You should give concrete examples from the resource(s) you
analyzed.
- Include quantitive results.
- Include representative examples.
- You don't need an extensive literature review, but you should
read and cite the references below and if you
consult other lexicons (which you are encouraged to do) then you
should read the papers that describe them carefully, and cite them.
- Include a link to your entire program on google colab --- I should be able to run the entire page
In the main paper describe generally what the program does (no
need to go into extreme detail)
- If you want to make it even more beautiful, as I am sure you do,
take a look at my (Computational)
Linguistic Style Guidelines: a guide for the flummoxed.
- Submit
softcopy (via moodle).
- The deadline is on the main page..
Project Two for LAC: Language and the Computer Francis Bond.