HiČKoK: History of Czech in Corpus Continuum

Speaker:
Daniel Zeman (ÚFAL MFF UK) and Jiří Pergler (UJČ AV ČR)
Abstract:
We will present the ongoing TAČR project focused on morphological annotation of texts from all historical stages of the Czech language. The goal of the project is to connect text corpora of different periods, so far built independently at different institutes, and to enrich them with lemmatization and uniform morphological annotation according to the Universal Dependencies standard. Manually annotated datasets will subsequently be used to train models capable of annotating other historical texts. After the initial overview, we will focus on some issues with designing uniform description of the changing language, especially in the oldest period (14th-15th centuries).
Length:
01:25:25
Date:
09/12/2024
views: 160

Images:
Attachments: (video, slides, etc.)
125.0 MB
160 downloads