TEITOK: Merging Digital Humanities and Corpus Linguistics

Speaker:
Maarten Janssen
Abstract:
Corpora nowadays form a core part of linguistics - and for historical linguistics, they should form an even more solid cornerstone given that there are no native speakers to rely on. However, the tools for linguistic corpora do not apply well to historical corpora: not only are automatic tools considerably less accurate, but corpus tools also throw out much of the information that documents coming from the digial humanities contain - information containing formatting, writing order, etc. TEITOK is a corpus tool that attempts to bridge this gap, by providing a full platform for TEI/XML based corpora that can respond to all the needs from the DH community, and combine them with information concerning linguistic annotation. This creates the possibility to have meticulously transcribed documents, be it historical, dialectal, spoken, etc. - that at the same time are fully searchable and exploitable using NLP techniques.
Length:
01:11:47
Date:
27/05/2019
views:

Images:
Attachments: (video, slides, etc.)
99 MB
249 downloads
798 MB
245 downloads
148 MB
253 downloads
214 MB
275 downloads
438 MB
220 downloads