Form and Function in Machine Translation

Lori Levin
State-of-the-art machine translation systems do not consistently handle simple syntactic patterns such as active and passive voice, subject-verb agreement, and filler-gap constructions. Syntax based statistical machine translation has made progress in capturing some patterns, but syntax alone is not enough. This talk considers a construction-based approach to form and function in MT, recognizing that there is not a one-to-one correspondence between forms such as "of" and "the" and functions such as expressing possession and definiteness. The work is conducted in the frameworks of Semantically Informed MT (SIMT) and Linguistic Core MT (LCMT). SIMT and LCMT are based on inventories of communicative functions from linguistic typology. They enable us to add interlingua-style semantic information to statistical MT while avoiding some of the shortcomings of old-style interlingua systems. An example will be given from the SIMT framework on modality in Urdu-English MT (NIST 2009 task). Functional information about modality was added to a syntax based MT system by relabeling VP nodes in English parse trees with more specific categories such as VP-Require, resulting in a small increase in BLEU score. The SIMT framework is joint work with the Human Language Technology Center of Excellence at Johns Hopkins University. Linguistic Core MT is joint work with colleagues at Carnegie Mellon University, University of Southern California, University of Texas at Austin, and MIT with funding from the US Army Research Lab. References: • Kathrin Baker, Michael Bloodgood, Bonnie Dorr, Nathaniel W. Filardo, Lori Levin and Christine Piatko, A Modality Lexicon and its use in Automatic Tagging, LREC 2010, Malta. • Kathy Baker, Chris Callison-Burch, Bonnie Dorr, Nathaniel Filardo, Scott Miller, Christine Piatko, Nathaniel Filardo, Lori Levin, Semantically-Informed Machine Translation: A Tree-Grafting Approach, AMTA 2010, Denver. • Kathy Baker, Steven Bethard, Michael Bloodgood, Ralf Brown, Chris Callison-Burch, Glen Coppersmith, Bonnie Dorr, Wes Filardo, Kendall Giles, Anni Irvine, Mike Kayser, Lori Levin, Justin Martineau, Jim Mayfield, Scott Miller, Aaron Phillips, Andrew Philpot, Christine Piatko, Lane Schwartz, and David Zajic, “Semantically Informed Machine Translation (SIMT)'', Final report of the 2009 Summer Camp for Applied Language Exploration, Human Language Technology Center of Excellence, Johns Hopkins University. • The HLTCOE *Technical Report #4 - SIMT SCALE 2009 - Modality Annotation Guidelines* • Language Understanding Annotation Corpus, Mona Diab, Bonnie Dorr, Lori Levin, Teruko Mitamura, Rebecca Passoneau, Owen Rambow, Lance Ramshaw. LDC Catalog No.: LDC2009T10, ISBN: 1-58563-513-8, • Release Date: Mar 17, 2009. • Mona Diab, Lori Levin, Teruko Mitamura, Owen Rambow, Vinod Parth, and Weiwei Guo, “Committed Belief Annotation and Tagging'' LAW III (Linguistic Annotation Workshop), ACL, Singapore, 2009. • Lori Levin, Alison Alvarez, Jeff Good, and Robert Frederking. "Automatic Learning of Grammatical Encoding" in A. Zaenen (ed.) Architectures, Rules, and Preferences: Variations on Themes by Joan W. Bresnan, CSLI Lecture Notes. • Lori Levin, Jeff Good, Alison Alvarez, and Robert Frederking. “Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings'', in the proceedings of Treebanks and Linguistic Theory, 2006, Prague.
views: 1714

Preview of img-001.jpg
Image img-001.jpg
Preview of img-007.jpg
Image img-007.jpg
Preview of img-036.jpg
Image img-036.jpg
Attachments: (video, slides, etc.)