Virtual Monday seminar

Daniel Zeman, Milan Straka

Enhanced and Deep Universal Dependencies for Many Languages

Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). In my talk, I will first discuss Enhanced Universal Dependencies (EUD), a set of semantically-oriented enhancements that have been proposed within the framework of Universal Dependencies (but which are still available only for a small number of languages). I will also present some preliminary observations from the current shared task on parsing into EUD ( In the second part, I will present some additional enhancements, called Deep UD, which extend beyond the official UD guidelines. I will focus on two aspects: how can these enhancements be useful for natural language understanding, and to what extent can they be obtained semi-automatically from the surface annotation for many typologically different languages.

Sesame Unsupervised Learning from Raw Texts and Its Applications

In last two years, many traditional NLP tasks have seen substantial improvements by advanced unsupervised pretraining from raw texts, for example Elmo or the Transformer-based BERT. I will illustrate these improvements on our results in POS tagging, lemmatization, syntactic parsing, semantic parsing and named entity recognition. Furthermore, these pretraining techniques are effective also in multilingual setting, where they allow both massive multilingual models (I will present a model performing POS tagging, lemmatization and syntactic parsing of 75 languages) and zero-shot cross-lingual transfer (for example, running question answering system in Czech by training only on English data). Finally, I will mention recent improvements in the original BERT architecture.

views: 1091

Attachments: (video, slides, etc.)
153.0 MB