Generating Czech Word Forms in MT-From System Combination to Black Art
Ondrej Bojar
When translating from English to Czech, target-side vocabulary is the critical part. Transfer-based systems like TectoMT are able to generate completely novel forms but do not reach the phrase-based benchmark (yet). Training phrase-based systems to generate new forms is not that straightforward, but I will describe two promising techniques: two-step translation and a new idea, the black art of “reverse self-training". The best results can be expected from system combination techniques. I will describe the experiments with Aachen MT output combination as I applied and adapted it for English to Czech translation. I was combining only UFAL systems (TectoMT and three different configurations of Moses), so I am searching for someone interested in reimplementing one missing bit and allowing the department to win this year's WMT.