COR: Project 1

Use a corpus to show the differences (and similarities) between two related words.

Pick two related "words" (t1 and t2) either:

One of the terms (t1) should be a moderately common word. To find the word, you choose a word somewhere between about #1000 and #3000 from Word frequency data.

For each of these two words, provide a corpus based description (as described below) and then discuss the differences between them.

FIRST do the following:

Before doing anything else, ask three different people (who aren't taking this class) what they think the top ten collocates of the word would be. They won't know what "collocates" means, so tell them that these are words that "hang out" a lot with the word in question. You might give the example of beach = sand, waves, sun, surf, etc.

Make sure you record basic information about these people.

Then

If you chose a non-English word don't forget to gloss all foreign words. You might want to try to look it up in a multilingual corpus like the NTU-MC to see what the translation data is lie.

Your sketch should ideally say something for each word about: the word's syntax, its denotation (what it means) and its connotation (what its usage implies).

Write the results up as a paper of up to 6 pages (references don't count: you can have up to two extra pages of references), probably with many tables. Use the ACL 2015 format, but don't make your papers anonymous. Read and follow the stylistic advice in the (Computational) Linguistic Style Guidelines: a guide for the bewildered. Note especially how to format tables.

Upload the final paper online.


COR (Corpus Linguistics) main page.

Francis Bond <bond@ieee.org> <francis.bond@upol.cz>
Home page