Lab 2

Overview

The goal of this lab is to make a start on the test suite that will be your development target, on the one hand, and to customize a first version of your grammar start on the other. I've selected phenomena to cover in this lab with an eye to starting with those that are essential to creating a working grammar. You'll probably want to work on these two subtasks in parallel, though they are described separately in the instructions below.

Test Suite

The first task is to create positive and negative example sentences illustrating the following phenomena, to the extent that they are relevant for your language:

word order
pronouns (includes person/number/gender)
case
the rest of the NP

Before you start, read the general instructions for testsuites and the formatting instructions.

Starter grammar

The second task is to create a starter grammar by filling out the required sections of the Grammar Matrix customization questionnaire. The goal here is to get as much coverage as you can over your test suite using only the customization system (no hand-editing of tdl files yet). In particular, you'll need to address these sections:

General information
Word order
Number
Person
Gender (if applicable)
Case
Direct-inverse (if appropriate)
Lexicon

In the word order section, you can skip the auxiliaries by saying "no" on that question for now. When we get to auxiliaries, you may of course revise this answer.

In the lexicon section, you should define lexical types for transitive and intransitive verbs and nouns. If appropriate, you should define determiners and case-marking adpositions.

If you have case and/or agreement, you'll need to define morpheme slots and morphemes for verbs and nouns as appropriate. In many languages, the agreement morphemes on verbs also mark, say tense. We'll ignore this for now, but return to it soon. If you want to define other affixes without giving them morphosyntactic content, you can.

Make sure you can parse individual sentences

Once you have created your starter grammar (or each time you create one, as you should iterate through grammar creation and testing a few times as you refine your choices), try it out on a couple of sentences interactively to see if it works:

Load the grammar into the LKB.
Using the parse dialog box (or 'C-c p' in emacs to get the parse command inserted at your prompt), enter a sentence to parse.
Examine the results. If it does parse, check out the semantics (pop-up menu on the little trees). If it doesn't look at the parse chart to see why not.
Problems with lexical rules and lexical entries often become apparent here, too: If the LKB can't find an analysis for one of your words, it will say so, and (obviously) fail to parse the sentence.

Note that the questionnaire has a section for test sentences. If you use this, then the parse dialog will be pre-filled with your test sentences.

[incr tsdb()] profile

The final step for this lab is to use the [incr tsdb()] grammar profiling system to test the performance of your starter grammar over your test suite, and then examine the results. (You may find in doing so that you want to refine certain aspects of your starter grammar. You can do this by uploading the file "choices" which comes with your grammar into the customization system and then tweaking from there.)

Create a test suite skeleton

Create a directory called tsdb inside your grammar directory.
Inside tsdb, create two subdirectories: home (for test suite instances) and skeletons (for skeletons).
Save a copy of Index.lisp in tsdb/skeletons
Save a copy of Relations in tsdb/skeletons.
Make a subdirectory called lab2 inside tsdb/skeletons for your test suite. (If you choose a different name for this subdirectory, you must edit Index.lisp accordingly.)
Download the python script make_item and run it on your test suite (you may have to make it executable):
```
./make_item testsuite.txt
```
Notes on make_item:
- This script is going to be pretty picky about the format of your test suite. If you have questions, please post to Canvas (10 minute rule!).
- It requires python3.
- If the above command is successful, testsuite.txt.item would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through and correct the errors one at a time. For example:
  ./make_item testsuite.txt item 2>errs
  The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
  make_item contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()] item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m/--map option.
  ./make_item --map orth-seg i-input testsuite.txt item
  The invocation above maps the orth-seg line into the input field.
  You can run make_item with -h/--help to see a summary of the options.
Copy the testsuite.txt.item file which is output by make_item to tsdb/skeletons/lab2/item.
Copy tsdb/skeletons/Relations to tsdb/skeletons/lab2/relations (notice the change from R to r).

The final directory structure should look like this:

grammar/data/testsuite
grammar/data/make_item.pl
grammar/data/testsuite.item
grammar/tsdb/skeletons/Index.lisp            (lists the testsuites)
grammar/tsdb/skeletons/Relations             (master copy of the database schema)
grammar/tsdb/skeletons/lab2/item             (copy of ../../data/testsuite.item)
grammar/tsdb/skeletons/lab2/relations        (copy of ../Relations)
grammar/tsdb/home                            (directory to store test profiles)

Create and run an initial test suite instance

Using ace and art

Make an empty profile
mkprof -s tsdb/skeletons/matrix trees/matrix.01
Compile your grammar with ace
ace -G eng.dat -g ace/config.td
Parse the profile with art
art -a "ace -g eng.dat -n 5" trees/matrix.01/

Using PyDelphin

PyDelphin is a reimplementation of many DELPH-IN formats and technologies. It is generally well-documented, tested, and more user-friendly than the traditional software. You can write code using PyDelphin's Python API, but you can also perform many tasks using the delphin command which becomes available when you install PyDelphin (see below).

Note that you will need Python 3.6 or higher.

First, create and activate a virtual environment

	$ python3 -m venv env
	$ source env/bin/activate
	(env) $

Install PyDelphin and ensure it's installed

	(env) $ pip install pydelphin
	(env) $ pip install delphin.highlight  # if you want colored MRSs
	(env) $ delphin -V
        delphin 1.2.1

Now you can use it to create an empty profile (as before, now using PyDelphin):

	(env) $ delphin mkprof -s tsdb/skeletons/matrix trees/matrix.01
	[...]

And parse the profile with ACE (as before, now using PyDelphin):

	(env) $ delphin process -g eng.dat trees/matrix.01
	NOTE: parsed 107 / 107 sentences, avg 4738k, time 3.03249s

The delphin select command is used to execute TSQL queries on a profile. Some useful field names for searching are:

i-id item identifiers
i-input input sentences
i-wf "well-formedness" conditions (0 = ungrammatical, 1 = grammatical)
i-length number of words in input
readings how many parses the grammar found for an input
mrs the mrs result of a parse

	(env) $ delphin select 'i-id i-input where i-wf = 0' trees/matrix.01  # display ungrammatical items
	[...]
	(env) $ delphin select 'i-id i-input where i-wf = 1' trees/matrix.01  # display grammatical items
	[...]
	(env) $ delphin select 'i-id i-input where i-wf = 0 and readings > 0' trees/matrix.01  # display ungrammatical items that parsed (overgeneration)
	[...]
	(env) $ delphin select 'i-id i-input where i-wf = 1 and readings = 0' trees/matrix.01  # display grammatical items that did not parse (undergeneration)
	[...]

The delphin compare command can compare two versions of a profile (assumes the existence of two profiles; below I use gold/matrix.01).

	(env) $ delphin compare trees/matrix.01 gold/matrix.01  # display ungrammatical items
	10    <0, 1, 0>
        20    <1, 0, 2>
	[...]

Using the lkb and tsdb

Start the lkb

Load your starter grammar. (The script file is in your-grammar-dir/lkb/script.)

Start [incr tsdb()] (within emacs, that's M-x itsdb)

In the [incr tsdb()] podium, select Options|Database Root and input the path to tsdb/home.

In the [incr tsdb()] podium, select Options|Skeleton Root and input the path to tsdb/skeletons.

Optional: For future use, you can set these variables ahead of time in a file called .tsdbrc in your home directory. It should contain these lines, with path names edited appropriately:

(tsdb:tsdb  :skeleton  "path-to-tsdb/home")
(tsdb:tsdb  :skeleton  "path-to-tsdb/skeletons")

In the [incr tsdb()] podium, select File|Create. You should see your test suite in the menu there. Select it, and get a test suite instance. Post to GoPost if this doesn't work.

Make sure your grammr is loaded into the LKB.

Once you have a test suite instance, select it (by clicking on it), then do Process|All Items.

Explore the results, with functions such as Browse|Results and Analyze|Competence.

Be sure to save (i.e., not overwrite or delete) this test suite instance, as you'll be asked to turn it in.

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) or latex which includes the following:

Documentation the choices you made in the customization system, illustrated with examples from your test suite. Here's an example of what this should look like.
Descriptions of any properties of your language illustrated in your test suite but not covered by your starter grammar and/or the customization system.
Documentation the coverage of your grammar over the testsuite. This should include both summary numbers, which you can get by using the Analyze | Coverage and Analyze | Overgeneration options in [incr tsdb()], and discussion of specific examples. If there are examples that thare parsed incorrectly (unanalyzed grammatical examples, analyzed ungrammatical examples, or grammatical examples assigned surprising parses), reflect on why that might be.
Finally, if there are any places where the customization system seems unable to cope with the properties of your language (within the phenomena addressed in this lab), describe them here.

Submit your assignment

Please use git as version control for your work in this class, and upload the grammars to

Be sure your write up and the text-file version of your test suite are included in your grammar directory (under doc and data respectively)
Likewise, make sure to include your most current tsdb profile in the grammar directory (ideally inside tsdb/home/).
Commit them to https://github.com/bond-lab/Grammar-Engineering-Grammars
Concrete steps (first time only)
- Clone the repository
  $ git clone https://github.com/bond-lab/Grammar-Engineering-Grammars.git
- Add your grammar directly to the top directory
  $ git add grammar_name
- Commit your grammar
  $ git commit grammar_name "first version"
- When it is ready upload it
  $ git push grammar_name "first version"
Concrete steps (supsequently)
- commit often
- pull, push when you are ready
Back to top
Back to course page

bond@ieee.org

Course materials borrow heavily from Linguistics 567: Knowledge Engineering for NLP at the University of Washington. Thanks to Emily Bender for letting us use them.