Computational Grammars: MT

Running the translation system

The first step is to get the tranlsation system running from your language to English (xxx2eng). Here are step-by-step instructions:

  1. Download the English grammar. Unpack it them with tar xzf eng.tgz. Or use the ERG (more fun).
  2. Set up your grammar:
  • Start two separate emacsen. Put one on the left of your screen (this will be the "source" emacs). Put one on the right of your screen ("target" emacs).
  • Start the LKB in each. Make sure the "source" LKB Top menu is on the left of the screen and the "target" one is on the right.
  • Load the your grammar into the "source" LKB.
  • Load the English grammar into the "target" LKB.
  • In the "target" LKB, select Options | Expand menu.
  • In the "target" LKB, select Generate | Start server.
  • In the "source" emacs/lkb parse the equivalent of the English sentence Dogs sleep.
  • From the pop-up menu on the tree that comes up, select "Rephrase." You should see a transfer output window and then the English grammar should output "Dogs sleep." in a realizations window.
  • Observe what happens: Do you get generation outputs? Some error in the emacs buffer in the "target" emacs?
  • If you get an error, you'll need to compare the MRSs to to see what the difference is. I expect that for Dogs sleep you won't need any transfer rules (depending on what you called your predicates), and thus any errors should be addressed through harmonization (aka cleaning up your MRS) and/or work on your semi.vpm file.

    Comparing MRSs

    To compare the MRSs, you can look at the MRS from the English grammar directly, but this can be a bit misleading, since you really want to look at the input to the generator (i.e., the transfer output). To do this, you can select "Generate | Display Input MRS" or "Generate | Display Internal MRS" from the "target" LKB Top menu.

    1. Generate | Display Internal MRS
    2. Parse the expected output
    3. Choose Indexed MRS from the pop-up menu

    There are a number of things that could be wrong:

    1. Missing RELS or HCONS (broken diff-list append).
    2. Misspelled PRED values (look carefully at the underscores).
    3. Misspelled/differently spelled feature values (e.g. sing instead of sg).
    4. Misspelled/differently spelled feature names (e.g., PERS instead of PER).
    5. Incompatible variable properties (features and values).

    Variable property mapping

    You may have noticed that you get many variants on generation if you start with a form that is underspecified for e.g., aspect or evidentiality. We can get a handle on this by using variable property mapping to supply default values in the unmarked case (either in monolingual generation or in the MT scenario). The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

    The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

    1. You should already have a semi.vpm file provided by the customization system. Open it up and see which variable properties are there, and then look in your grammar to see what is missing. In general, we'd expect to see all of the features of the types event and ref-ind represented in a mature semi.vpm file.
    2. You need to tell the lkb to load the semi.vpm file by uncommenting the following line in lkb/script:
      (mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
      
    3. This line needs to be moved higher in the script file, specifically, it needs to be before the code block that loads the trigger rules.
    4. You'll also need to add this lines to lkb/mrsglobals.lsp:
      (setf *variable-type-mapping* :semi)
      
    5. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
    6. If your grammar encodes aspectual distinctions, you'll need to add an ASPECT section, modeled on tense. This should allow you to create and use specific a default value of ASPECT.
    7. If you have any other features you have added on indices, you will need to provide VPM entries for them as well.
    8. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise.
    9. You can do a similar trick for other kinds of generation ambiguity relating to variable properties.

    Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.


    First MT

    Preliminaries

    This week, we'll be using the LOGON MT set up, which doesn't respect ICONS. I hope to also try the ACE set up, which does. (But you'll still need the LOGON/LKB version in order to debug transfer, I believe.)

    Running the translation system

    The first step is to get the translation system running for English to Frisian (eng2frr). Here are step-by-step instructions:

    Update semi.vpm, if necessary

    The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki.

    1. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page. (There is also a an example in the semi.vpm file in the eng grammar.)
    2. If your grammar encodes aspectual distinctions, you'll need to add an ASPECT section, modeled on tense. This should allow you to specific a default value of ASPECT as well.
    3. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In the semi.vpm file, at hte bottom of your section on aspect, add:
      * >> no-aspect
      no-aspect << [e]
      

    Create a transfer grammar

    Once you have Dogs sleep translating, it's time to try a broader range of the MMT sentences.

    Note that you will be modifying the English and Italian grammars for this part of the lab. You will need to add mt-mrs.tdl, mtr.tdl and acm.tdl. Of those, acm.tdl should be the most interesting. You'll want to edit the file acm.mtr to create instances of the transfer rules that you need for your grammar. It will be simplest to edit this file in one grammar (say the English one) and create a symbolic link to it in the other grammar, so that you have one transfer grammar for your language.

    1. Try translating all of the MMT sentences from English to your language and Italian to your language.
    2. For each one that doesn't go through, compare the input MRS to the MRS your expected output is giving.
    3. Do any harmonization that is warranted.
    4. For the remaining differences, look to see if one of the existing transfer rule types in acm.tdl will do the trick. If so, create an instance of that transfer rule type in acm.mtr, e.g.,:
      pro-drop := pronoun-delete-mtr.
      
    5. If you need a different transfer rule, ask Petter or I about what you need, and we'll work out how to formulate it.
    6. Reload the "source" grammar and try translating again.
    7. Rinse and repeat.

    Running the translation system

    If you would like to try translating with ACE instead (included in Ubuntu+LKB 17, 64-bit version), you can try out these instructions, compiled by Sanghoun Song.

    Attempt to translate into your language

    Comparing MRSs

    To compare the MRSs, you can look at the MRS from the English grammar directly, but this can be a bit misleading, since you really want to look at the input to the generator (i.e., the transfer output). To do this, you can select "Generate | Display Input MRS" or "Generate | Display Internal MRS" from the "target" LKB Top menu.

    1. Generate | Display Internal MRS
    2. Parse the expected output
    3. Choose Indexed MRS from the pop-up menu

    There are a number of things that could be wrong:

    1. Missing RELS or HCONS (broken diff-list append).
    2. Misspelled PRED values (look carefully at the underscores).
    3. Misspelled/differently spelled feature values (e.g. sing instead of sg).
    4. Misspelled/differently spelled feature names (e.g., PERS instead of PER).
    5. Incompatible variable properties (features and values).

    Update semi.vpm, if necessary

    The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki.

    1. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page. (There is also a an example in the semi.vpm file in the eng grammar.)
    2. If your grammar encodes aspectual distinctions, you'll need to add an ASPECT section, modeled on tense. This should allow you to specific a default value of ASPECT as well. Note that the English and Frisian grammars don't encode tense or aspect, so this is strictly for the MT demo.
    3. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise.

    Back to course page


    bond@ieee.org

    Course materials borrow heavily from Linguistics 567: Knowledge Engineering for NLP at the University of Washington. Thanks to Emily Bender for letting us use them.