The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline
Scherrer, Yves; Miletić, Aleksandra; Kuparinen, Olli (2023-12-01)
Scherrer, Yves
Miletić, Aleksandra
Kuparinen, Olli
Teoksen toimittaja(t)
Sawaf, Hassan
El-Beltagy, Samhaa
Zaghouani, Wajdi
Magdy, Walid
Abdelali, Ahmed
Tomeh, Nadi
Abu Farha, Ibrahim
Habash, Nizar
Khalifa, Salam
Keleg, Amr
Haddad, Hatem
Zitouni, Imed
Mrini, Khalil
Almatham, Rawan
ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
01.12.2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023121210731
https://urn.fi/URN:NBN:fi:tuni-2023121210731
Kuvaus
Peer reviewed
Tiivistelmä
The Helsinki-NLP team participated in the NADI 2023 shared tasks on Arabic dialect translation with seven submissions. We used statistical (SMT) and neural machine translation (NMT) methods and explored character- and subword-based data preprocessing. Our submissions placed second in both tracks. In the open track, our winning submission is a character-level SMT system with additional Modern Standard Arabic language models. In the closed track, our best BLEU scores were obtained with the leave-as-is baseline, a simple copy of the input, and narrowly followed by SMT systems. In both tracks, fine-tuning existing multilingual models such as AraT5 or ByT5 did not yield superior performance compared to SMT.
Kokoelmat
- TUNICRIS-julkaisut [19020]