Multilingual Coreference Resolution with Harmonized Annotation
Speaker:
Miloslav Konopík and Ondřej Pražák (Faculty of Applied Sciences, University of West Bohemia, Plzeň)
Abstract:
We will describe an end-to-end coreference resolution system and experiments on recently created multilingual corpus CorefUD. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models – for Slavic languages and all the languages together. We rely on an end-to-end deep learning model adapted for the CorefUD corpus. We discuss the difficulties we faced, mainly regarding the differences between corpora in different languages. Next, we focus on the problem of predicting singleton relations. Finally, we deal with the benefits of harmonized annotations. We will show that using joined models helps significantly for the languages with smaller training data.