Corpus Arbore À La Carte

Speaker:
Vladimir Petkevic
Abstract:
While all treebanks are very rich and useful resources, they tend to reflect a specific approach to linguistic analysis and may distract users without a background in theoretical linguistics or those subscribing to a different theory. Yet despite appearances a comparison of different theory-specific representations reveals a substantial overlap of content. Rather than attempting to accommodate such different views in a single design avoiding a theoretical bias, our aim is to build a syntactically annotated corpus of Czech allowing for various modes of interpretation according to the preferences of the user, including the standard pattern presented in Czech primary and secondary schools. The three-level corpus (consisting of the orthographic, morphological and syntactic levels) will support various external representations of the same internal data being queried and retrieved – dependency or constituent structure, surface or deep syntax, parallel access to data from all the three levels, visualization of various phenomena such as agreement, analytical predicates, collocations, etc. The user will be equipped with a query language and tools to customize the interface in a plethora of ways with underspecification as one of the main options. Taking the analytical level of the Prague Dependency Treebank as the point of departure, the author(s) will focus on the key distinctions between their approach and the concept of PDT.
Length:
01:18:05
Date:
14/03/2011
views: 829

Images:
Preview of img-001.jpg
Image img-001.jpg
Preview of img-006.jpg
Image img-006.jpg
Preview of img-039.jpg
Image img-039.jpg
Attachments: (video, slides, etc.)
36M
361 downloads
88M
346 downloads
333M
348 downloads