From the Jungle to a Park: Harmonizing Annotations Across Languages

Speaker:
Dan Zeman
Abstract:
In this talk I will describe my work towards universal representation of morphology and dependency syntax in treebanks of various languages. Not only is such harmonization advantageous for linguists-users of corpora, it is also a prerequisite for cross-language parser adaptation techniques such as delexicalized parsing. I will present Interset, an interlingua-like tool to translate morphosyntactic representations between tagsets; I will also show how the features from Interset are used in a recent framework called Universal Dependencies. Some experiments with delexicalized parsing on harmonized data will be presented. Finally, I will discuss the extent to which various morphological features are important in the context of statistical dependency parsing. [This is a slightly updated version of my talk at SPMRL/IWPT in July in Bilbao.]
Length:
01:25:13
Date:
14/12/2015
views: 1241

Images:
Preview of 1.jpg
Image 1.jpg
Preview of 2.jpg
Image 2.jpg
Preview of 3.jpg
Image 3.jpg
Attachments: (video, slides, etc.)
118 MB
854 downloads
1.1 GB
877 downloads
571 MB
1242 downloads
275 MB
887 downloads
165 MB
857 downloads