Toward more linguistically-informed translation models
Speaker:
Adam Lopez
Abstract:
Modern translation systems model translation as simple substitution and permutation of word tokens, sometimes informed by syntax. Formally, these models are probabilistic relations on regular or context-free sets, a poor fit for many of the world's languages. Computational linguists have developed more expressive mathematical models of language that exhibit high empirical coverage of annotated language data, correctly predict a variety of important linguistic phenomena in many languages, explicitly model semantics, and can be processed with efficient algorithms. I will discuss some ways in which such models can be used in machine translation, focusing particularly on combinatory categorial grammar (CCG).