LAC: Language and the Computer: Project 3
In-Class On-Line Open-Book Programming Challenge
This assignment constitutes 30% of your final grade
for LAC.
Comparing and describing wordnets
Given a pair of wordnets, compare them. Given a single wordnet describe it or illustrate it.
Please start with these wordnets:
- Tokyo University of Foreign Studies Basic Vocabulary for English
- Arasaac for English
- Then test with similar wordnets, you can chose the languages
- TUFS: ['as', 'de', 'en', 'es', 'fr', 'id', 'ja', 'km', 'ko', 'lo', 'mn', 'my', 'pt', 'ru', 'th', 'tl', 'tr', 'ur', 'vi', 'zsm']
- ARASAAC: ['an', 'ar', 'bg', 'ca', 'cs', 'de', 'en', 'es', 'et', 'eu', 'fa', 'fr', 'gl', 'he', 'hr', 'hu', 'it', 'ko', 'lt', 'mk', 'nb', 'nl', 'pl', 'pt', 'ro', 'ru', 'sk', 'sr', 'zh']
- Core synsets are a semi-automatically compiled list of 5000
"core" word senses in Princeton WordNet (approximately the 5000 most
frequently used word senses). We have mapped them to ilis
- Basic synsets are a compiled list of 578 "basic" synsets from
the Tokyo University of Foreign Studies Basic Vocabulary for Japanese.
We have mapped them to ilis
Single wordnet description
Write a function (or functions) that take a wordnet lexicon id as input, assuming the wordnet has already been read.
- How many synsets, senses and words are there?
- How many synsets, senses and words are there for each pos (part of speech)?
- Show these as e.g. pi charts
Make the size of the pi chart proportional to the actual size
E.g if one wn has 100,000 synsets and one has 5,000, the first pi chart should be 20 times bigger
- Show three examples of each pos
Compare wordnets
- How many synsets, senses and words are there only in one, in
both or only in the other?
- Show these as e.g. bar charts --- left for one wordnet, right for the other
- Show three examples of words in one, both or other for each pos
Compare wordnet to curated list
- Try this with the core and basic lists
- How many synsets are in the wordnet from the curated concept list
- See the coverage over the semantic fields (lexicographer files)
- Graph this and give examples
- Compare the two lists
Illustrate a basic list
- Go through the basic list, and show a picture for each word
- Easiest to do by concepts
- Also show synonyms and examples, if any
- You can get a definition from another wordnet (like en-omw:1.4)
- Try to break things into smaller groups (pos, semantic field)
- If the word has not picture, experiment with a hypernym or hyponym
- Not how many words are missing pictures, and list them
- You can show images in google colab like this:
from IPython.display import display, Markdown
ara_id=2910
display(Markdown((f"""**Computer**
<img width='48' src='https://static.arasaac.org/pictograms/{ara_id}/{ara_id}_300.png'> *from Arasaac*""")))
Deliverables
- Per person:
- 1-2 page paper in any format, finally given as pdf,
describing what you have done, and
what you think could be done given more time.
Include title, author, date
e.g.
LAC Project 3: Comparing wordnets
Francis Bond
bond@ieee.org
2024-12-18
The file should be named name_surname_matric.pdf
e.g. francis_bond_007.pdf
you do not need full citations, this is just a short summary
don't include the program as an appendix (just share a link)
- program
The program should be in a google colab
Include the URL in the paper
- Upload them to moodle
If you don't want me to show off your
code/output/paper, please let me know within a month
Pedagogical Goals
- Consolidate skills in a novel task under realistic conditions
You can do interesting tasks!
- Demonstrate an ability to break a task into small chunks
Should be able to consider and evaluate multiple approaches
- Demonstrate an ability to use python to solve tasks
choosing the right approach, writing and debugging
- Demonstrate an ability to cooperate to solve a problem
good communication, delivering (partial) results in time
Project Three for LAC: Language and the Computer Francis Bond.