#3 Pre-processing

Speaker:
Ondřej Bojar
Abstract:
The third MT Talk is devoted to basic pre-processing steps: issues of normalization and tokenization. This talks is also complemented with a warm-up coding exercise: Unicode lowercaser and deaccenter.
Length:
00:08:23
Date:
21/01/2015
views: 1174

Images:
Attachments: (video, slides, etc.)
12 MB
703 downloads
30 MB
865 downloads
50 MB
1175 downloads
24 MB
827 downloads