Multilingual pre-trained encoders: How far can we get with multilingual data?

Speaker:
Jindřich Libovický (ÚFAL MFF UK)
Abstract:
Pre-trained multilingual encoders trained with monolingual data only show surprising cross-lingual abilities. One thing that makes monolingually trained multilingual encoders attractive is that they do not require an explicit cross-lingual alignment using parallel data. Avoiding parallel data might have the advantage of not enforcing the culture of the highest-resourced language in the model. But is that really so? In the talk, we will discuss several ways of improving cross-lingual alignment with monolingual data only. Further, we will show two case studies on how the decision to use or not use parallel data affects how the models capture culture-related meaning aspects using an (almost) unsupervised interpretability method.
Length:
00:55:15
Date:
18/12/2023
views: 332

Images:
Attachments: (video, slides, etc.)
92.0 MB
333 downloads