Multilingual Spontaneous Speech Corpora: Compilation And Annotation
Antonio Moreno Sandoval
In this talk I will discuss issues on compilation and annotation of spoken
corpora. The languages covered will be Spanish, Chinese and Japanese, and
also Spanish child language. Although we use an unified methodology based on the C-ORAL-ROM project, specific strategies have been adopted for
transcribing and tagging each corpus. This approach allows cross-lingual
comparison while showing the distinctive features. I will focus on problems and how we handle them.