The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline
Scherrer, Yves; Miletić, Aleksandra; Kuparinen, Olli (2023-12-01)
Scherrer, Yves
Miletić, Aleksandra
Kuparinen, Olli
01.12.2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023121210731
https://urn.fi/URN:NBN:fi:tuni-2023121210731
Kuvaus
Peer reviewed
Tiivistelmä
The Helsinki-NLP team participated in the NADI 2023 shared tasks on Arabic dialect translation with seven submissions. We used statistical (SMT) and neural machine translation (NMT) methods and explored character- and subword-based data preprocessing. Our submissions placed second in both tracks. In the open track, our winning submission is a character-level SMT system with additional Modern Standard Arabic language models. In the closed track, our best BLEU scores were obtained with the leave-as-is baseline, a simple copy of the input, and narrowly followed by SMT systems. In both tracks, fine-tuning existing multilingual models such as AraT5 or ByT5 did not yield superior performance compared to SMT.
Kokoelmat
- TUNICRIS-julkaisut [23470]