Bayesian non-parametric models for parsing and translation

Speaker:
Trevor Cohn
Abstract:
Context free grammars have long been popular for modelling natural language syntax and translation between human languages. However, their underlying independence assumptions are much too stringent for accurate data modelling. Considerable research effort has focussed on using linguistic intuitions to enrich CFGs, resulting in state-of-the-art parsing performance. In this talk, I take a different approach by learning an enriched grammar directly from the data without resorting to linguistic knowledge. Instead the grammar is an emergent property, found by unsupervised inference in a Bayesian model of tree-substitution grammar (TSG; a.k.a. DOP). Bayesian methods provide an elegant and theoretically principled way to model TSG by incorporating a prior over the grammar and integrating over uncertain events. I'll present non-parametric Bayesian models for two related tasks: 1) learning a TSG for syntactic parsing and 2) learning a synchronous TSG for machine translation. The models learn compact and simple grammars, uncovering latent linguistic structures and in doing so outperform competitive baselines. This is joint work with Phil Blunsom and Sharon Goldwater.
Length:
01:31:08
Date:
12/04/2010
views: 1101

Images:
Preview of img-002.jpg
Image img-002.jpg
Preview of img-019.jpg
Image img-019.jpg
Preview of img-036.jpg
Image img-036.jpg
Attachments: (video, slides, etc.)
42M
530 downloads
140M
544 downloads
564M
507 downloads