Generating Czech Word Forms in MT-From System Combination to Black Art

Speaker:
Ondrej Bojar
Abstract:
When translating from English to Czech, target-side vocabulary is the critical part. Transfer-based systems like TectoMT are able to generate completely novel forms but do not reach the phrase-based benchmark (yet). Training phrase-based systems to generate new forms is not that straightforward, but I will describe two promising techniques: two-step translation and a new idea, the black art of “reverse self-training". The best results can be expected from system combination techniques. I will describe the experiments with Aachen MT output combination as I applied and adapted it for English to Czech translation. I was combining only UFAL systems (TectoMT and three different configurations of Moses), so I am searching for someone interested in reimplementing one missing bit and allowing the department to win this year's WMT.
Length:
01:24:29
Date:
21/02/2011
views: 745

Images:
Preview of img-001.jpg
Image img-001.jpg
Preview of img-007.jpg
Image img-007.jpg
Preview of img-020.jpg
Image img-020.jpg
Attachments: (video, slides, etc.)
39M
334 downloads
143M
315 downloads
491M
330 downloads