HG7021: Computational Grammars
Instructor Francis Bond. 2020,
2014, 2012.
HSSK MR4 Thursday 13:00-17:00
In this course we implement a computational grammar of a language (one each).
You must have a laptop, preferably running Linux, or install a virtual
machine running linux (I am most familiar with Ubuntu).
Course Outline
Week | Topic | Lab | Wiki | Reading |
1 |
Overview, Introduction
|
Installing the LKB and dependencies |
|
Sag, Wasow and Bender (2003) Ch 1-2 |
|
2 |
The Grammar Matrix: Motivations, technical details |
Getting to know the LKB and the Grammar Matrix; Choose language |
|
Copestake (2002) Ch 1-3 |
3 |
Testsuites, [incr tsdb()] |
Testsuites/customization I: Word order, person/number/gender, pronouns, case, the rest of the NP, basic lexicon (full form) |
|
Copestake (2002) Ch 4, 5
(Oepen & Flickinger 1998) |
4 |
Typed Feature Structures and Unification |
Testsuites/customization II:
Tense/aspect,
agreement,
other required affixes,
negation,
argument optionality,
demonstratives,
|
|
Bender, Flickinger & Oepen (2002) |
5 |
Minimal Recursion Semantics |
Testsuites/customization III:
Matrix yes-no questions,
coordination,
modification,
non-verbal predicates,
embedded clauses,
information structure |
|
Copestake, Flickinger, Pollard and Sag (2005) |
6 |
Valence and
Semantic Composition
in HPSG; Chart Parsing
|
Revision |
Git
for Grammar Engineers |
Sag, Wasow and Bender (2003) ch 4-5 |
7 |
Modification, Discourse Status, Argument Optionality; Precision grammars and corpus data |
Modification, demonstratives, argument optionality |
|
Borthen and Haugereid (2005)
(Baldwin et al (2005)) |
9 |
Composition and Decomposition: Demonstrative pronouns and multi-word expressions |
Breaking words down and building phrases
up |
|
Sag et al. (2002) |
9 |
Clause types, illocutionary force |
Polar questions, embedded clauses, non-verbal
predicates |
|
Sag, Wasow and Bender (2003) Ch 5 |
10 |
Negation, Raising, Control, Argument Composition |
I can eat glass. It doesn't hurt
me. |
|
Sag, Wasow and Bender (2003) Ch 12 |
11 |
Treebanking
|
Treebanking and Parse
ranking |
FFTB |
Oepen,
Flickinger and Bond (2004) |
12 |
Deep Semantic Transfer: The LOGON MT architecture
Information Structure and VPM |
Grammar clean up; Transfer rules |
VPM |
Bond et al. 2011 |
13 |
No Slides |
Classifiers |
|
|
Textbooks and Tools
Tools
The Grammar Engineering Cycle
- Create/improve/degrade! a grammar
- Add sentences to test the change
to a functional test suite (or the inline documentation)
- Check they parse ok (or fail to parse for negative
sentences)
- Document the changes (in grammar and paper/thesis/assignment/book)
- Regression test: Parse the functional test suite and a general test suite
- If there was an improvement, commit it
else back to step 1
- Treebank the new profiles (maybe after several improvements)
The update function should make this cheap
- Identify and analyse your next phenomenon
- Look if there are existing solutions
in other grammars
in the literature
- back to step 1
Readings
- Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (2004).
Road-testing
the English Resource Grammar over the British National Corpus. Proceedings of LREC 2004, Lisbon, Portugal
- Bender, Emily M., Dan Flickinger and Stephan Oepen (2002)
The Grammar Matrix:
An Open-Source Starter-Kit for the Rapid Development of
Cross-Linguistically Consistent Broad-Coverage Precision Grammars
Carroll, John, Nelleke Oostdijk, and Richard Sutcliffe, eds.
Proceedings of the Workshop on Grammar Engineering and Evaluation at
the 19th International Conference on Computational Linguistics. Taipei, Taiwan. pp. 8-14.
- Emily M. Bender, Scott Drellishak, Antske Fokkens, Michael Wayne Goodman Daniel P. Mills,
Laurie Poulson and Safiyyah Saleem (2010)
Grammar Prototyping and Testing
with the LinGO Grammar Matrix Customization System
Proceedings of the ACL 2010 System Demonstrations. pp.1-6.
- Francis Bond, Stephan Oepen, Eric Nichols, Dan Flickinger, Erik Velldal and Petter Haugereid (2011)
Deep Open-Source Machine Translation
in Machine Translation, 25(2), pages 87--105.
- Borthen and Haugereid (2005) Representing referential properties of nominals
Research on Language and Computation 3(2):221-246
- Copestake, Flickinger, Pollard and Sag
(2005) Minimal
Recursion Semantics
- Stephan Oepen and Dan Flickinger (1998)
Towards Systematic Grammar Profiling:
Test Suite Technology Ten Years after. Journal of Computer
Speech and Language 12:411-435.
- Stephan Oepen, Dan Flickinger and Francis Bond (2004)
Towards
Holistic Grammar Engineering and Testing --- Grafting Treebank Maintenance into the Grammar Revision Cycle
In Beyond
Shallow Analyses --- Formalisms and Statistical Modelling for Deep
Analysis (Workshop at IJCNLP-2004), Hainan Island.
- Sag, Ivan A., Timothy Baldwin, Francis Bond, Ann Copestake and Dan Flickinger
(2002) Multiword Expressions: A Pain
in the Neck for NLP.
In Alexander Gelbuk, editor, Computational Linguistics and
Intelligent Text Processing: Third International Conference:
CICLing-2002 1-15, Springer-Verlag, Hiedelberg/Berlin.
Assessment
Due 2020-04-28
- 20% Testsuite (with internal documentation).
- See the testsuite specifications for
instructions.
- Your test suite should have around 60 entries, at
least half of which should be ungrammatical.
- 40% Grammar
- the grammar should be uploaded to your github repository
- it should be working (I should be able to load and test it)
- it should contain a skeleton created from the testsuite
- it should have at least one treebank
- it should have a filter to translate to another grammar
- 40% Documentation of the Grammar of up to 20 pages following
the LMS guidelines
- Describe briefly the phenomena covered by the grammar
- Point out where the grammar fails to model the language
- Suggest ways the grammar should be extended
Submission Instructions
From week 5 (when we have stopped making changes via the Matrix),
please upload your grammar to github.
- There are general instructions here
- Please put your testsuite under
data/
data/testsuite
- Please put your skeletons under
tsdb/skeletons/
tsdb/skeletons/testsuite/
tsdb/skeletons/corpus/
- Please put your write up under
docs
docs/writeup.tex
- Please also commit a pdf version of your writeup
docs/writeup.pdf
I recommend you add to .gitattributes
*.odt binary
*.pdf binary
This will mean git does not try to save diffs.
Concrete steps (first time only)
- Create a repository
- Clone the repository
$ git clone https://github.com/YOURNAME/GRAMMAR.git
- Add your grammar directly to the top directory
$ git add grammar_name
- Commit your grammar
$ git commit grammar_name "first version"
- When it is ready upload it
$ git push grammar_name "first version"
Concrete steps (subsequently)
- commit often
- pull, push when you are ready
Acknowledgment
Course materials borrow heavily
from Linguistics 567:
Knowledge Engineering for NLP at the University of Washington.
Thanks to
Emily Bender for
letting us use them.
Chart parsing slides by Ewan Kline, from the University of
Edinburgh's Introduction
to Computational Linguistics (2006/07)
These slides are hosted on github
at https://github.com/bond-lab/Grammar-Engineering.