TEITOK: Merging Digital Humanities and Corpus Linguistics

Speaker:
Maarten Janssen
Abstract:
Corpora nowadays form a core part of linguistics - and for historical linguistics, they should form an even more solid cornerstone given that there are no native speakers to rely on. However, the tools for linguistic corpora do not apply well to historical corpora: not only are automatic tools considerably less accurate, but corpus tools also throw out much of the information that documents coming from the digial humanities contain - information containing formatting, writing order, etc. TEITOK is a corpus tool that attempts to bridge this gap, by providing a full platform for TEI/XML based corpora that can respond to all the needs from the DH community, and combine them with information concerning linguistic annotation. This creates the possibility to have meticulously transcribed documents, be it historical, dialectal, spoken, etc. - that at the same time are fully searchable and exploitable using NLP techniques.
Length:
01:11:47
Date:
27/05/2019
views: 987

Images:
Attachments: (video, slides, etc.)
99 MB
521 downloads
798 MB
988 downloads
148 MB
528 downloads
214 MB
546 downloads
438 MB
472 downloads