Francis Morton Tyers (Indiana University Bloomington)
Abstract:
This talk takes a broad view of the development of treebanks based on the Universal Dependencies annotation scheme for Mesoamerican languages. Mesoamerica for the purposes of this talk includes Mexico, Guatemala, Belize, El Salvador and Honduras. The Mesoamerican languages come from many groups, including Uto-Aztecan, Mayan, Oto-Manguean and many others, including language isolates such as Huave. Many features are shared by many of the languages including a prevalence of verb-initial and head marking in possessive constructions and a system of relational nouns and code mixing with Spanish. The talk discusses annotation guidelines for these phenomena and also the performance of multilingual models on languages from groups or exhibiting features that are not found in their training data.