General Tagging Guidelines

Tagging Instructions


How the data is prepared


Word Senses and Tags

Holmesian Annotation

If we want to write something about a word or sentence in the style of Holmesian Annotation, then start the annotation with 'ANOT: '. E.g.

sid 48800 "Sufficient for to-morrow is the evil thereof, but I hope before the day is past to have the upper hand at last.'"
Sentence comment = "ANOT: A reference to Mathew 6:34 "Take therefore no thought for the morrow: for the morrow shall take thought for the things of itself. Sufficient unto the day is the evil thereof.""
Note: students cannot write sentence comments at present, but can mention it in their reports or add it to a word.

Named Entities

  • If it is a name (or part of a name) chose an appropriate name tag:

    We sometimes create named entities in order to match across languages, although we do not expect to keep them in the wordnet.

    These are tagged as notag in SemCor.


    Suggesting changes to wordnet (tag with w)

    Tagging Metaphorical Uses

    If you think that a word is being used in a metaphorical sense, but it is not established enough to be in wordnet, then tag with the closest literal sense, and add "METAPHOR" in the comments. If there is a good synset for the metaphorical use, then also note that "METAPHOR: 12345678-x".

    Tagging Foreign Words

    Example: At the end of the day, Muslims break their fast (buka puasa) with a communal meal at home or at the mosque.

    Regardless of the tag they may currently have (i.e. likely 'e' or 'w'), foreign words (like 'buka puasa', above) should be tagged with the synset that corresponds to that concept in the corresponding language.

    In this case, we don't want to add 'buka puasa' to the English wordnet. Buka puasa is an Indonesian word, and it should be present / or added to that wordnet. If you are ever asked to tag a word like this there are two things you should do:

    Extending wordnet

    Experienced annotators can have access to the wordnet editing interface (this is not the case for students annotating as part of a class project).

    Suggested changes to wordnet can be added to wordnet.

    Don't forget to check for orthographic variants of existing synsets first. E.g. stir-fry is in wordnet as stir fry, then stir-fry should just be added to that synset (=00326459).


    Tagging Issues


    Detailed Guidelines for English

    These are based on the Guidelines for the Gloss Annotation Project and the Cheatsheet fof the Gloss Annotation Project, thanks to Christiane Fellbaum for sharing them.

    How to determine part of speech for the word/collocation. This is not always as obvious as it seems! There are four particularly tricky cases. These are all tricky because the part of speech of a word is not always the same as the grammatical function that the word is performing in the sentence or phrase. For instance, nouns can function similarly to what are traditionally called adjectives, and verbs can take on the roles of nouns or adjectives.

    Adjective vs. noun modifying a noun

    And sometimes after the noun: Nouns can also serve as modifiers, similar to adjectives:

    The general rule of thumb for deciding whether it is a noun or adjective is to check the sense list first for whether there is an adjective sense corresponding to the word, and, if not, then whether there's a corresponding noun sense. So, damp in damp weather is an adjective, even though a noun sense exists. And cotton in cotton shirts is a noun, which is modifying another noun. If there is no adjective sense in WordNet, then you should make sure that it is not truly an adjective that is missing from WordNet. A good clue that you have an adjective is if you try to modify it with very or rather and it sounds ok: very/rather favorable conditions (ok) vs. very/rather cotton shirts (not ok). Another good clue is if you can make a comparative or superlative form out of it (damper/dampest/more favorable/most favorable conditions are all adjectives, but cottoner/cottonest/more cotton/most cotton shirts are not). If either of these tests come up ok (that is, very/rather x sounds good or either x-er/x-est or more/most x sounds good), and there is no matching adjective sense, then you need to add a new sense to wordnet. Note that these tests are only valid if they come up ok. Then you know you have an adjective for sure. If the tests are not ok, then it may still be an adjective. This is because the tests only work for certain kinds of adjectives, but not all. If the tests are not ok (that is, none of very/rather x and x-er/x- est/more x/most x sound good), then check for a matching noun sense. If there is no matching noun sense, then do not assign any sense. (But see below first regarding present and past participles, since it might be a verb!). If a noun sense does exist, then the word can be considered a noun, and be tagged to the noun sense.The noun-sense rule applies only when the word is modifying a noun. If the word is being used predicatively (that is, after some form of the verb be, or where the verb could be replaced by a verb such as seem, look, appear, etc.) In the predicative case, there may be some confusion as to whether what follows the verb is an adjective or a noun. So, in

    damp is an adjective here. Notice that you can replace was with seemed/looked/appeared and still get a grammatical sentence: the weather seemed/looked/appeared damp. But note the difference between the pairs:

    and

    In the second pair, drunk is clearly a noun, not an adjective. It is the complement of the verb be here. Two reliable ways to recognize a noun are if it is (or can be) preceded by a determiner (such as a or the) or adjective (he was a silly drunk). In summary: When you have the situation of modifier noun, the modifier will be an adjective when there is a corresponding adjective sense in WordNet for the meaning it is being used with OR there is no corresponding adjective sense, but any of the tests come up ok ( very/rather (sounds good when you preface it with very or rather) or –er/– est/more/most (x-er/x-est/more x/most x)) , in which case Sense not in WordNet should be assigned.

    The modifier will be a noun when:

    If it is neither an adjective nor a noun, it might be a verb (see below)

    Adjective vs. present participle (-ing form) of verb

    The -ing form of verbs can function as adjectives. For instance,

    How to tell? The easy case is when the word is modifying a noun. In general, these are adjectives if there is a corresponding adjective sense in WordNet. Such adjective senses exist for frightening and working . However, this is not the case for clicking and playing, so that in the following sentences,

    the appropriate verb senses of click and play would be selected instead. (This is because these are verbs playing the part of adjectives, but are not adjectives in themselves.) When the word appears predicatively (after some form of the verb be), the rule can't always be applied since it might be impossible to tell whether it is being used as a verb or an adjective.

    Without more information, you cannot know whether the third sentence means that the women are picketing, or whether they are beautiful. For ambiguous cases like this, if the context does not make it clear chose which you think is most appropriate and add a comment saying that it is hard to tell.

    Adjective vs. past participle (usually -en form) of verb

    Past tense participles can also function as adjectives. The past tense participle is the form of the verb that appears with the past tense auxiliary have. It usually, though not always, ends in -en or -ed: written, destroyed, and spun are past participles of write, destroy and spin, respectively. The rule of thumb will be similar to the present participle cases. Where the word modifies a noun, check first for a corresponding adjective sense. If no adjective sense exists, then assign the verb sense (if there is one that matches the meaning as used in the sentence).

    Again, the hard cases occur when the word appears predicatively (ie., after some form of the verb "to be", or where the preceding verb can be replaced by a verb such as seem/look/appear, etc.).

    In the first sentence, written is a verb. A good test of this is to put the auxiliary verb in the progressive – The sentence WAS BEING written down for clarity. That makes it clear it is an act or action that occurred. The second sentence cannot be phrased that way and still have the same sense: The sentence was being written as opposed to spoken.In the third sentence, it is not clear whether "written" refers to an act of writing, or the attribute or quality of being written. For ambiguous cases like this, do not assign a sense, and the lexicographers will make the determination.

    Noun vs. present participle (-ing form) of verb

    To complicate things further, the present participle of verbs can function as a noun. Often, the distinction is easy to make, if it appears where a noun is called for grammatically, and there is a corresponding noun sense in WordNet.

    If no noun sense exists, then assign the verb sense, if one exists, as for

    However, if the word is being used as a verb, then a noun sense should never be assigned! This is easy if there is no noun sense, as for frolicking

    or when it is obviously depicting an ongoing action

    You can test this out, too. A verb can never be modified by a or the or a possessive pronoun such as my/your/our, etc. Try it with the 2 sentences above--it hurts! But, again, there will be cases where this determination will be impossible to make

    It is not clear whether writing in the 3rd sentence refers to the act of writing something (eg, a letter), or whether writing is the object itself (ie, her writing, or an author's writing, marks on a piece of paper, etc.) For ambiguous cases like this, assign a sense and comment on the difficulty.

    Adjective+hyphen+inflected noun long-legged, brown-eyed

    Tag the first word as JJ, surface form includes the hyphen, lemma does not: long-, JJ long; brown-, JJ brown.

    Tag the second word as NN, surface form includes the ed, lemma does not: legged, NN leg; eyed, NN eye. If the compound is in wordnet, make a multi-word compund and use it (e.g long-legged is 02385851-a).

    In any case, make two concepts for the original words, tagged as 'x' for the multiword and tagged for each word otherwise.

    Similarly for well-written (Adv, verb) and odd-sounding: split, lemma does not include hyphen.


    Irregular Comparatives/Superlatives

    Please tag comparatives and superlatives with their base form. So better is good, least is less, etc

    The fact that e.g. wiser means "more wise" is deducible from the POS tag (JJR: comparative), so just chose the most appropriate sense of wise

    Using Wordnet Relations to determine sense (or senses)

    In WordNet, senses are in part defined by their relations to other senses. For this reason, the WordNet relations can be very useful in narrowing down which of the senses applies to a particular occurrence of the form. The relations for any word or collocation can be viewed through the WordNet browser. From the WordNet entry of the word you are tagging, clicking on one of the sense buttons will display the full entry: you may want to middle click to open in a new tab.

    The main relations that are of help are Hypernyms, Derivationally related forms, and Domain. Not all relations will appear for all words and all parts of speech for a word.

    Hypernym (ISA relation)

    The immediate hypernym is the most relevant one here. It is the first indented relation just below the definition (preceded by an arrow =>). The hypernym relation will tell you what kind of thing (object or action) the word refers to. The higher up you go in the hypernym relations, the more general the senses get (and so often less informative). There is a new indentation for each level up you go. For instance, two senses of the noun center that are rather close are

    If you look at the hypernyms for the noun senses of center, you can see that Sense A is an area while Sense B is a point, what they have in common is a notion of centrality. Both are at some level locations, and eventually all nouns are entities (so that knowing that something is a kind of entity is not of much help at all!).

    Domain

    Is this term restricted to one topic or area or field or context?

    Where they exist, the domain relations can be quite helpful in narrowing senses down. A word’s domain will tell you whether it is restricted to some field or area such as Law or Art. Take the noun work. It has 7 senses, and if you look at its domains, you can see that one of its senses is restricted to the domain of physics, having to do with the transfer of energy.


    References


    Thanks to Christiane Fellbaum for sharing some documentation from the wordnet gloss tagging project. Source for the tagger is available here.