Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Speech-to-text Transcription and Output Parsing for Air Traffic Control Applications

Viljanen, Aleksi (2024)

 
Avaa tiedosto
ViljanenAleksi.pdf (39.64Mt)
Lataukset: 



Viljanen, Aleksi
2024

Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-11-11
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202410179338
Tiivistelmä
Effective communication is essential in air traffic control (ATC), where precise interactions between pilots and air traffic control officers (ATCOs) influence safety and efficiency. ATC communication utilizes a specialized form of English with distinct grammatical structures and vocabulary aimed at minimizing ambiguity. Despite these measures, language-related incidents and near misses continue to occur, highlighting the potential advantages of automatic speech recognition (ASR) systems tailored to this domain. However, traditional ASR systems face challenges with ATC communication due to the specialized language and the limited availability of high-quality training data.

This thesis explores the feasibility of developing a high-accuracy ASR system specifically tailored for ATC communication using open-source tools. The primary objective is to accurately transcribe spoken ATC communications into text and convert this output into a machine-readable format suitable for enhancing ATCO training simulations. To accomplish this, a neural acoustic model based on Coqui’s implementation of Deep Speech was fine-tuned with a custom dataset that captures the linguistic characteristics of ATC communication. Additionally, a KenLM n-gram language model was trained to handle domain-specific vocabulary and grammatical structures. A custom parser utilizing ANTLR was developed to perform syntactic and lexical analysis, transforming raw transcriptions into structured data by identifying entities such as call signs, waypoints, and other specialized terms.

The system was evaluated using a custom test set representing typical ATC utterances, achieving a word error rate of 1.8% and a character error rate of 1.6%. These results indicate the feasibility of developing a reliable ATC transcription system. The research demonstrates that combining specialized acoustic and language models with a custom parser can effectively address the unique challenges of ATC communication. The findings suggest that such a system has the potential to enhance ATCO training simulations and improve operational efficiency in aviation.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [41809]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste