HG7021: Computational Grammars
Instructor Francis Bond. 2020,
  2014, 2012.
HSSK MR4 Thursday 13:00-17:00
In this course we implement a computational grammar of a language (one each).  
You must have a laptop, preferably running Linux, or install a virtual
  machine running linux (I am most familiar with Ubuntu).
Course Outline
| Week | Topic | Lab | Wiki | Reading | 
| 1 | 
Overview, Introduction 
 | 
  Installing the LKB and dependencies | 
    | 
Sag, Wasow and Bender (2003) Ch 1-2 | 
   | 
| 2 | 
The Grammar Matrix: Motivations, technical details | 
Getting to know the LKB and the Grammar Matrix; Choose language | 
   | 
  Copestake (2002) Ch 1-3 | 
| 3 | 
Testsuites, [incr tsdb()] | 
Testsuites/customization I: Word order, person/number/gender, pronouns, case, the rest of the NP, basic lexicon (full form) | 
   | 
  Copestake (2002) Ch 4, 5
 (Oepen & Flickinger 1998) | 
| 4 | 
Typed Feature Structures and Unification | 
Testsuites/customization II: 
  Tense/aspect,
  agreement,
  other required affixes,
  negation,
  argument optionality,
  demonstratives,
 | 
   | 
  Bender, Flickinger & Oepen (2002) | 
| 5 | 
Minimal Recursion Semantics | 
Testsuites/customization III:
  Matrix yes-no questions,
  coordination,
  modification,
  non-verbal predicates,
  embedded clauses,
  information structure | 
   | 
  Copestake, Flickinger, Pollard and Sag (2005) | 
| 6 | 
Valence and 
Semantic Composition
in HPSG; Chart Parsing
 | 
Revision | 
  Git
  for Grammar Engineers | 
  Sag, Wasow and Bender (2003) ch 4-5 | 
| 7 | 
Modification, Discourse Status, Argument Optionality; Precision grammars and corpus data | 
Modification, demonstratives, argument optionality | 
   | 
  Borthen and Haugereid (2005)
 (Baldwin et al (2005)) | 
| 9 | 
Composition and Decomposition: Demonstrative pronouns and multi-word expressions | 
Breaking words down and building phrases
    up | 
    | 
Sag et al. (2002) | 
| 9 | 
Clause types, illocutionary force | 
Polar questions, embedded clauses, non-verbal
    predicates | 
    | 
Sag, Wasow and Bender (2003) Ch 5 | 
| 10 | 
Negation, Raising, Control, Argument Composition | 
I can eat glass.  It doesn't hurt
      me. | 
    | 
Sag, Wasow and Bender (2003) Ch 12 | 
| 11 | 
  
    
    Treebanking
   | 
Treebanking and Parse
    ranking | 
   FFTB | 
Oepen, 
    Flickinger and Bond (2004) | 
| 12 | 
Deep Semantic Transfer: The LOGON MT architecture
Information Structure and VPM | 
  Grammar clean up; Transfer rules | 
     VPM | 
Bond et al. 2011 | 
  | 13 | 
No Slides | 
  Classifiers | 
   | 
   | 
Textbooks and Tools
Tools
The Grammar Engineering Cycle
  - Create/improve/degrade! a grammar
    
      - Add sentences to test the change
	
to a functional test suite (or the inline documentation)
       - Check they parse ok (or fail to parse for negative
	sentences)
      
 - Document the changes (in grammar and paper/thesis/assignment/book)
    
 
   - Regression test: Parse the functional test suite and a general test suite 
  
 - If there was an improvement, commit it
    
else back to step 1
   - Treebank the new profiles (maybe after several improvements)
    
The update function should make this cheap
   - Identify and analyse your next phenomenon
    
      - Look if there are existing solutions
	
 in other grammars
	
 in the literature
     
   - back to step 1
 
Readings
  - Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (2004). 
    Road-testing 
      the English Resource Grammar over the British National Corpus. Proceedings of LREC 2004, Lisbon, Portugal 
  
 - Bender, Emily M., Dan Flickinger and Stephan Oepen (2002) 
    The Grammar Matrix: 
      An Open-Source Starter-Kit for the Rapid Development of 
      Cross-Linguistically Consistent Broad-Coverage Precision Grammars 
    Carroll, John, Nelleke Oostdijk, and Richard Sutcliffe, eds. 
    Proceedings of the Workshop on Grammar Engineering and Evaluation at 
    the 19th International Conference on Computational Linguistics. Taipei, Taiwan. pp. 8-14. 
  
 - Emily M. Bender, Scott Drellishak, Antske Fokkens, Michael Wayne Goodman Daniel P. Mills, 
    Laurie Poulson and Safiyyah Saleem (2010) 
    Grammar Prototyping and Testing 
      with the LinGO Grammar Matrix Customization System 
    Proceedings of the ACL 2010 System Demonstrations. pp.1-6. 
  
 - Francis Bond, Stephan Oepen, Eric Nichols, Dan Flickinger, Erik Velldal and Petter Haugereid (2011)
      Deep Open-Source Machine Translation
	in Machine Translation, 25(2), pages 87--105.
  
 - Borthen and Haugereid (2005) Representing referential properties of nominals 
    Research on Language and Computation 3(2):221-246
  
 - Copestake, Flickinger, Pollard and Sag
  (2005) Minimal
  Recursion Semantics
  
 - Stephan Oepen and Dan  Flickinger (1998)
    Towards Systematic Grammar Profiling:
    Test Suite Technology Ten Years after. Journal of Computer
    Speech and Language 12:411-435.
  
 - Stephan Oepen, Dan  Flickinger and Francis Bond (2004)
    Towards 
      Holistic Grammar Engineering and Testing --- Grafting Treebank Maintenance into the Grammar Revision Cycle
    In Beyond
	Shallow Analyses --- Formalisms and Statistical Modelling for Deep
	Analysis (Workshop at IJCNLP-2004), Hainan Island.
 - Sag, Ivan A., Timothy Baldwin, Francis Bond, Ann Copestake and Dan Flickinger
    (2002) Multiword Expressions: A Pain
    in the Neck for NLP.  
    In Alexander Gelbuk, editor, Computational Linguistics and
    Intelligent Text Processing: Third International Conference:
    CICLing-2002 1-15, Springer-Verlag, Hiedelberg/Berlin.  
 
 
Assessment
Due 2020-04-28
  -  20% Testsuite (with internal documentation). 
    
      - See the testsuite specifications for
	instructions.  
      
 - Your test suite should have around 60 entries, at
	least half of which should be ungrammatical.
    
 
   -  40% Grammar 
    
      -  the grammar should be uploaded to your github repository
      
 -  it should be working (I should be able to load and test it)
      
 -  it should contain a  skeleton created from the testsuite
      
 -  it should have at least one treebank
      
 -  it should have a filter to translate to another grammar
    
 
   -  40% Documentation of the Grammar of up to 20 pages following
    the LMS guidelines
    
      -  Describe briefly the phenomena covered by the grammar
      
 -  Point out where the grammar fails to model the language
      
 -  Suggest ways the grammar should be extended
    
 
 
Submission Instructions
From week 5 (when we have stopped making changes via the Matrix),
  please upload your grammar to github.  
  
    - There are general instructions here
    
 - Please put your testsuite under 
data/
      
data/testsuite
    - Please put your skeletons under 
tsdb/skeletons/
     
tsdb/skeletons/testsuite/
     
tsdb/skeletons/corpus/ 
    - Please put your write up under 
docs   
     
docs/writeup.tex
    - Please also commit a pdf version of your writeup
      
docs/writeup.pdf
   
I recommend you add to .gitattributes
  
    *.odt binary
    *.pdf binary
  
  This will mean git does not try to save diffs.
  Concrete steps (first time only)
  
    - Create a repository
    
 - Clone the repository
	
$ git clone https://github.com/YOURNAME/GRAMMAR.git
       - Add your grammar directly to the top directory
      
$ git add grammar_name
       - Commit your grammar
	
$ git commit grammar_name "first version"
       - When it is ready upload it
	
$ git push grammar_name "first version"
     
      Concrete steps (subsequently)
       
	 - commit often
	 
 - pull, push when you are ready
     
 
  
 
  
Acknowledgment
Course materials borrow heavily
from Linguistics 567:
Knowledge Engineering for NLP at the University of Washington.
Thanks to
Emily Bender for
letting us use them.
Chart parsing slides by Ewan Kline, from the University of
  Edinburgh's Introduction
  to Computational Linguistics (2006/07)
  
    These slides are hosted on github
    at https://github.com/bond-lab/Grammar-Engineering.