Collecting Linguistic Resources for Assessing Children's Pronunciation of Nordic Languages
Olstad, Anne Marte Haug; Smolander, Anna; Strömbergsson, Sofia; Ylinen, Sari; Lehtonen, Minna; Kurimo, Mikko; Getman, Yaroslav; Grosz, Támas; Cao, Xinwei; Svendsen, Torbjørn; Salvi, Giampiero (2024)
Olstad, Anne Marte Haug
Smolander, Anna
Strömbergsson, Sofia
Ylinen, Sari
Lehtonen, Minna
Kurimo, Mikko
Getman, Yaroslav
Grosz, Támas
Cao, Xinwei
Svendsen, Torbjørn
Salvi, Giampiero
2024
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202501091254
https://urn.fi/URN:NBN:fi:tuni-202501091254
Kuvaus
Peer reviewed
Tiivistelmä
This paper reports on the experience collecting a number of corpora of Nordic languages spoken by children. The aim of the data collection is providing annotated data to develop and evaluate computer assisted pronunciation assessment systems both for non-native children learning a Nordic language (L2) and for L1 children with speech sound disorder (SSD). The paper presents the challenges encountered recording and annotating data for Finnish, Swedish and Norwegian, as well as the ethical considerations related with making this data publicly available. We hope that sharing this experience will encourage others to collect similar data for other languages. Of the different data collections, we were able to make the Norwegian corpus publicly available in the hope that it will serve as a reference in pronunciation assessment research.
Kokoelmat
- TUNICRIS-julkaisut [20689]