Grammar-licensed Treebank of Czech

Speaker:
Tomáš Jelínek, Milena Hnátková, Vladimír Petkevič, Alexandr Rosen, Hana Skoumalová, Přemysl Vítovec, Jiří Znamenáček
Abstract:
We describe main features of a treebank of Czech, licensed by an HPSG-style grammar. The treebank is parsed by a stochastic parser, then converted to phrase-structure trees, which are then checked by the formal grammar. During the conversion to phrase structures, the information on terminal nodes is transformed into a 3D structure - every node is described by its morphological, syntactic and lexical properties. For example, the relative pronoun 'který' can have three different POSs (morphological adjective, syntactic noun, lexical pronoun) on the three levels of description.

The grammar cooperates with the VALLEX valency lexicon. Lexical rules are used for the derivation of surface frames for various morphological forms of every verb (infinitive, indicative, l-participle, passive participle, etc.). The actual data are matched with these surface valency frames. If the match is successful, the resulting annotation is enriched with information derived from the parsed data and from the lexicon.

Length:
01:26:02
Date:
02/05/2016
views: 1228

Images:
Attachments: (video, slides, etc.)
119 MB
779 downloads
947 MB
768 downloads
506 MB
1229 downloads
254 MB
755 downloads
176 MB
770 downloads