Video Recordings

Multilingual pre-trained encoders: How far can we get with multilingual data?

Speaker:

Jindřich Libovický (ÚFAL MFF UK)

Abstract:

Pre-trained multilingual encoders trained with monolingual data only show surprising cross-lingual abilities. One thing that makes monolingually trained multilingual encoders attractive is that they do not require an explicit cross-lingual alignment using parallel data. Avoiding parallel data might have the advantage of not enforcing the culture of the highest-resourced language in the model. But is that really so? In the talk, we will discuss several ways of improving cross-lingual alignment with monolingual data only. Further, we will show two case studies on how the decision to use or not use parallel data affects how the models capture culture-related meaning aspects using an (almost) unsupervised interpretability method.

Length:

00:55:15

Date:

18/12/2023

Video Recordings

Institute of Formal and Applied Linguistics

Multilingual pre-trained encoders: How far can we get with multilingual data?