Lab 1
NB The lab assignments will typically include write up
instructions at the end. Before you start, read the whole assignment
once, including the write-up instructions, so you know what to keep
track of along the way.
Choose your language and stake your claim
- Choose the language you would like to work on this
quarter.
- Find reference materials for the language that are available.
Grammar Customization: Get a small grammar for English
- Download our choices file
- Visit the LinGO Grammar Matrix customization page.
- Click on "Browse" next to "Upload choices file..." and upload the choices file you downloaded.
- Click on all of the subpages (starting with "General
Information" and "Word order") to see which options have been
selected in this choices file, and what other options are
available.
- Click on "create grammar" and download the .tgz or .zip file.
- There is also a local copy here: eng.tgz
LKB: Getting started
- Install Linux
(preferably Ubuntu
LTS) on your laptop (either as a virtual machine, dual boot or
sole OS).
Then, install
LOGON.
- Run emacs
- Type M-x lkb (M-x logon on older systems) to run the LKB
- Unzip the grammar you downloaded (tar xzf 567_english.tar.gz)
- Load the starter grammar in the LKB:
- Try parsing:
- Examine the file lexicon.tdl in the starter grammar, and try making up sentences to parse based on the vocabulary there.
- Find four different sentences that do parse. Record these in your write up.
- Find two strings, using the vocabulary in lexicon.tdl, that don't parse. Record these in your write up.
Try interactive unification
These instructions assume you are using the LUI interface, which
I believe is on by default. If they don't make sense, try invoking
(lui-initialize) at the LKB prompt.
- Ask the LKB to parse each of the strings you found that it doesn't parse.
- In the LKB Top menu, choose "Parse | Show parse chart".
- Examine the parse chart to find the first point of failure in parsing. Which constituents should combine, if only some constraint weren't blocking them?
- In the LKB Top menu, choose "View | Grammar Rule" and select the rule that you think should (modulo that constraint) combine the constituents.
- Click on "phr-synsem" (value of SYNSEM) to collapse that sub-structure.
- Choose the constituent from the parse chart that you believe should be the left-hand daughter and drag it onto the first element of the ARGS list in the rule. You should get a new window, labeled "unification result".
- Shrink "phr-synsem" in the "unification result" window, and then choose the constituent you believe should be the right-hand daughter and drag it onto the second element of the ARGS list in the rule.
- You should get a new window labeled "unification failure", with the point of failure highlighted in red.
- Look in the grammar files to see where the constraints that led to that unificaiton failure are encoded, and record this information in your write up.
- Do the same for the other non-parsing string you found.
Chain of identities
In the MRS assigned by this grammar to A cat chased me, the ARG0 value the _cat_n_rel is associated with the ARG1 value of the _chase_v_rel (that is, the cat is doing the chasing). In this part of the assignment, you will trace the chain of identities that connects these two.
- Parse the sentence, and click on the small tree to get the larger tree.
- Click on the N node above cat to get the feature structure associated with that node.
- Explore the feature structure, to locate the feature INDEX and see what it is identified with. (You may find it useful to shrink down certain substructures, and to use the pop-up menus on the identity tags.)
- Do the same with the second N node above cat (representing the singular noun lexical rule), the NP node, the S node, the VP node, and the two V nodes.
- Now look through the .tdl files to find the types which encode the constraints responsible for the chain of identities. You'll want to start with the leaf types, but you'll need to look through supertypes, too. This can be done by using grep or the search functionality in emacs (C-s). The supertypes in a type definition are after the ":=". To find where a type is defined, search for the type name followed by ":=".
Note that in addition to exploring the supertypes by searching through the .tdl files, you can also look at them through the LKB. For example, think of the constraint that you expect the lexical entry for "cat" to be contributing. Then:
- From the LKB Top menu, choose "View | Lex entry"
- Enter "cat" (the identifier for that lexical entry)
- Right click on the type at the top left of the tfs that pops up (commoun-noun-lex) (LUI directions ... in the non-LUI GUI it's a left click)
- Choose "view type definition"---this should give a non-LUI window, showing the type definition, without inherited constraints.
- If you don't see the constraint you're looking for, explore the parent type(s) in the same fashion.
Write up
Please submit write-ups as plain text files. (In future labs,
that will help me run example sentences through your grammar. It also
helps me reply to questions in your write up, by copying the questions
into my grading rubric.)
Your write up should include:
- The four sentences you found that parse.
- The two (or more) strings you found that didn't parse.
- The names of the rules you used in interactive unification to see why they didn't parse.
- The tdl snippets that lead to the conflicting constraints for each non-parsing string, along with a prose description of what they do.
- A description of the chain of identities linking the ARG0 of _cat_n_rel to the ARG1 of _chase_v_rel in A cat chased me. Each link in the chain should say which instance is involved (e.g., lexical entry for cat), which supertype it inherits the constraint from, and show the tdl for the constraint. In addition, you should indicate which identity tag is enforcing the constraint. Your description should take the form of a numbered list.
I find 13 links in this chain, counting the two constraints given
in the example below as just one, since they come from the same type.
To help you out, and to give you a sense of the format I'm expecting,
here's one of them. (I picked this one because it is possibly the most obscure.)
5. The head-spec phrase structure rule inherits the following
constraints from basic-head-spec-phrase:
HEAD-DTR [ SYNSEM [ LOCAL [ CONT.HOOK #hdhook ],
NON-HEAD-DTR.SYNSEM
[ LOCAL [ CAT [ VAL [ SPEC < [ LOCAL [ CONT.HOOK #hdhook ] > ] ],
CONT.HOOK #hook ] ],
C-CONT [ HOOK #hook ] ].
identifying the C-CONT.HOOK of the rule with the HOOK of the non-head
daughter via #hook, and identifying the CONT.HOOK of the head daughter
with the CONT.HOOK value inside the non-head daughter's SPEC via #hdhook.
- At least three questions that this lab caused you to wonder about.
(Please indicate if you've figured out the answers, or if you would still like to see them addressed.)
- If you were unable to complete any part of the assignment, a
description of the problems you encountered and what you think might
be going on. (You can earn partial credit for any part of the
assignment you couldn't get working by describing it in this section.)
Submit your assignment
- For this assignment, you only need to submit your write up.
- Please email it to bond@ieee.org
bond@ieee.org
Course materials borrow heavily
from Linguistics 567:
Knowledge Engineering for NLP at the University of Washington.
Thanks to
Emily Bender for
letting us use them.