The Role of Pseudo-Parallel Data in Unsupervised Machine Translation

Speaker:
Ivana Kvapilíková (ÚFAL MFF UK)
Abstract:
Unsupervised machine translation (UMT) has gained considerable recognition for its capacity to produce translations without relying on parallel corpora. We investigate the role of training on pseudo-parallel data in advancing UMT. Pseudo-parallel data is a valuable resource that arises from two monolingual corpora by matching equivalent or similar sentences. However, the benefits of this technique vary across language pairs. We analyze the limitations of using pseudo-parallel data, including noise, domain mismatch and data scarcity. Addressing these challenges is vital for enhancing the robustness and real-world applicability of UMT systems.
Length:
00:54:27
Date:
13/11/2023
views: 328

Images:
Attachments: (video, slides, etc.)
91.0 MB
329 downloads