#3 Pre-processing

Speaker:
Ondřej Bojar
Abstract:
The third MT Talk is devoted to basic pre-processing steps: issues of normalization and tokenization. This talks is also complemented with a warm-up coding exercise: Unicode lowercaser and deaccenter.
Length:
00:08:23
Date:
21/01/2015
views: 1165

Images:
Attachments: (video, slides, etc.)
12 MB
702 downloads
30 MB
858 downloads
50 MB
1166 downloads
24 MB
820 downloads