The goal of this lab is to make a start on the test suite that will be your development target, on the one hand, and to customize a first version of your grammar start on the other. I've selected phenomena to cover in this lab with an eye to starting with those that are essential to creating a working grammar. You'll probably want to work on these two subtasks in parallel, though they are described separately in the instructions below.
The first task is to create positive and negative example sentences illustrating the following phenomena, to the extent that they are relevant for your language:
Before you start, read the general instructions for testsuites and the formatting instructions.
The second task is to create a starter grammar by filling out the required sections of the Grammar Matrix customization questionnaire. The goal here is to get as much coverage as you can over your test suite using only the customization system (no hand-editing of tdl files yet). In particular, you'll need to address these sections:
In the word order section, you can skip the auxiliaries by saying "no" on that question for now. When we get to auxiliaries, you may of course revise this answer.
In the lexicon section, you should define lexical types for transitive and intransitive verbs and nouns. If appropriate, you should define determiners and case-marking adpositions.
If you have case and/or agreement, you'll need to define morpheme slots and morphemes for verbs and nouns as appropriate. In many languages, the agreement morphemes on verbs also mark, say tense. We'll ignore this for now, but return to it soon. If you want to define other affixes without giving them morphosyntactic content, you can.
Once you have created your starter grammar (or each time you create one, as you should iterate through grammar creation and testing a few times as you refine your choices), try it out on a couple of sentences interactively to see if it works:
Note that the questionnaire has a section for test sentences. If you use this, then the parse dialog will be pre-filled with your test sentences.
The final step for this lab is to use the [incr tsdb()] grammar profiling system to test the performance of your starter grammar over your test suite, and then examine the results. (You may find in doing so that you want to refine certain aspects of your starter grammar. You can do this by uploading the file "choices" which comes with your grammar into the customization system and then tweaking from there.)
./make_item testsuite.txt
Notes on make_item:
testsuite.txt.item
would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through
and correct the errors one at a time. For example:
./make_item testsuite.txt item 2>errs
The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
make_item
contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()]
item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m
/--map
option.
./make_item --map orth-seg i-input testsuite.txt item
The invocation above maps the orth-seg
line into the input field.
You can run make_item
with -h
/--help
to see a summary of the options.
testsuite.txt.item
file which is output by make_item
to tsdb/skeletons/lab2/item
.
tsdb/skeletons/Relations
to tsdb/skeletons/lab2/relations
(notice the change from R to r).
grammar/data/testsuite grammar/data/make_item.pl grammar/data/testsuite.item grammar/tsdb/skeletons/Index.lisp (lists the testsuites) grammar/tsdb/skeletons/Relations (master copy of the database schema) grammar/tsdb/skeletons/lab2/item (copy of ../../data/testsuite.item) grammar/tsdb/skeletons/lab2/relations (copy of ../Relations) grammar/tsdb/home (directory to store test profiles)
mkprof -s tsdb/skeletons/matrix trees/matrix.01
ace -G eng.dat -g ace/config.td
art -a "ace -g eng.dat -n 5" trees/matrix.01/
PyDelphin is a reimplementation of many DELPH-IN formats and technologies. It is generally well-documented, tested, and more user-friendly than the traditional software. You can write code using PyDelphin's Python API, but you can also perform many tasks using the delphin command which becomes available when you install PyDelphin (see below).
Note that you will need Python 3.6 or higher.
$ python3 -m venv env $ source env/bin/activate (env) $
(env) $ pip install pydelphin (env) $ pip install delphin.highlight # if you want colored MRSs (env) $ delphin -V delphin 1.2.1
(env) $ delphin mkprof -s tsdb/skeletons/matrix trees/matrix.01 [...]
(env) $ delphin process -g eng.dat trees/matrix.01 NOTE: parsed 107 / 107 sentences, avg 4738k, time 3.03249s
i-id
item identifiersi-input
input sentencesi-wf
"well-formedness" conditions (0 = ungrammatical, 1 = grammatical)i-length
number of words in inputreadings
how many parses the grammar found for an inputmrs
the mrs result of a parse(env) $ delphin select 'i-id i-input where i-wf = 0' trees/matrix.01 # display ungrammatical items [...] (env) $ delphin select 'i-id i-input where i-wf = 1' trees/matrix.01 # display grammatical items [...] (env) $ delphin select 'i-id i-input where i-wf = 0 and readings > 0' trees/matrix.01 # display ungrammatical items that parsed (overgeneration) [...] (env) $ delphin select 'i-id i-input where i-wf = 1 and readings = 0' trees/matrix.01 # display grammatical items that did not parse (undergeneration) [...]
gold/matrix.01
).
(env) $ delphin compare trees/matrix.01 gold/matrix.01 # display ungrammatical items 10 <0, 1, 0> 20 <1, 0, 2> [...]
(tsdb:tsdb :skeleton "path-to-tsdb/home") (tsdb:tsdb :skeleton "path-to-tsdb/skeletons")
Your write up should be a plain text file (not .doc, .rtf or .pdf) or latex which includes the following:
Please use git as version control for your work in this class, and upload the grammars to
doc
and data
respectively)
$ git clone https://github.com/bond-lab/Grammar-Engineering-Grammars.git
$ git add grammar_name
$ git commit grammar_name "first version"
$ git push grammar_name "first version"
Course materials borrow heavily from Linguistics 567: Knowledge Engineering for NLP at the University of Washington. Thanks to Emily Bender for letting us use them.