Speech-to-text Transcription and Output Parsing for Air Traffic Control Applications
Viljanen, Aleksi (2024)
Viljanen, Aleksi
2024
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-11-11
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202410179338
https://urn.fi/URN:NBN:fi:tuni-202410179338
Tiivistelmä
Effective communication is essential in air traffic control (ATC), where precise interactions between pilots and air traffic control officers (ATCOs) influence safety and efficiency. ATC communication utilizes a specialized form of English with distinct grammatical structures and vocabulary aimed at minimizing ambiguity. Despite these measures, language-related incidents and near misses continue to occur, highlighting the potential advantages of automatic speech recognition (ASR) systems tailored to this domain. However, traditional ASR systems face challenges with ATC communication due to the specialized language and the limited availability of high-quality training data.
This thesis explores the feasibility of developing a high-accuracy ASR system specifically tailored for ATC communication using open-source tools. The primary objective is to accurately transcribe spoken ATC communications into text and convert this output into a machine-readable format suitable for enhancing ATCO training simulations. To accomplish this, a neural acoustic model based on Coqui’s implementation of Deep Speech was fine-tuned with a custom dataset that captures the linguistic characteristics of ATC communication. Additionally, a KenLM n-gram language model was trained to handle domain-specific vocabulary and grammatical structures. A custom parser utilizing ANTLR was developed to perform syntactic and lexical analysis, transforming raw transcriptions into structured data by identifying entities such as call signs, waypoints, and other specialized terms.
The system was evaluated using a custom test set representing typical ATC utterances, achieving a word error rate of 1.8% and a character error rate of 1.6%. These results indicate the feasibility of developing a reliable ATC transcription system. The research demonstrates that combining specialized acoustic and language models with a custom parser can effectively address the unique challenges of ATC communication. The findings suggest that such a system has the potential to enhance ATCO training simulations and improve operational efficiency in aviation.
This thesis explores the feasibility of developing a high-accuracy ASR system specifically tailored for ATC communication using open-source tools. The primary objective is to accurately transcribe spoken ATC communications into text and convert this output into a machine-readable format suitable for enhancing ATCO training simulations. To accomplish this, a neural acoustic model based on Coqui’s implementation of Deep Speech was fine-tuned with a custom dataset that captures the linguistic characteristics of ATC communication. Additionally, a KenLM n-gram language model was trained to handle domain-specific vocabulary and grammatical structures. A custom parser utilizing ANTLR was developed to perform syntactic and lexical analysis, transforming raw transcriptions into structured data by identifying entities such as call signs, waypoints, and other specialized terms.
The system was evaluated using a custom test set representing typical ATC utterances, achieving a word error rate of 1.8% and a character error rate of 1.6%. These results indicate the feasibility of developing a reliable ATC transcription system. The research demonstrates that combining specialized acoustic and language models with a custom parser can effectively address the unique challenges of ATC communication. The findings suggest that such a system has the potential to enhance ATCO training simulations and improve operational efficiency in aviation.