ROMI, A Corpus Of Language Expressions Of Roma Children And Youth In Czech

K. Šormová, Z. Bedřichová, J. Hana, A. Rosen
The lecture will introduce ROMi, a corpus of language expressions of Roma children and youth in Czech along with error annotation of Czech. In the first part of the lecture (in English), we cover the error annotation schema specifically designed for both the ROMi and Czesl corpora; we will also discuss the feat error annotation tool. In the second part of the lecture (in Czech) we will analyze individual texts of the Roma pupils. We will present the corpus itself, discuss the possibilities of how it can be put to use, and examine texts of Roma children, especially essays of Roma children attending elementary training schools, and recorded conversations with those children. The recordings h ave been processed in cooperation with UFAL. Attached are few examples of such texts: Would you be able to guess the authors’ age, grade and the type of elementary school they attend? Can you tell whether the author is of Roma origin or not?

