Automatic Assessment Of Parkinson’s Disease Using Spontaneous Speech
Ylinen, Hanna (2023)
Ylinen, Hanna
2023
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-03-22
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202303172976
https://urn.fi/URN:NBN:fi:tuni-202303172976
Tiivistelmä
Parkinson's disease is a neurodegenerative disease with a range of symptoms, including speech impairments. These can be detected with digital signal processing, since speech signals carry paralinguistic information, which means information beyond linguistic information. In this work, Parkinson's disease is being recognized from speech signals using machine learning methods while following the steps of a typical research of paralinguistic speech processing. The main goal of this work is to evaluate how different feature extractions and machine learning models are capable of recognizing Parkinson's disease from spontaneous speech.
The literature research part of this work presents the stages of a typical paralinguistic speech processing pipeline and evaluates related studies and research. Based on the related studies, people with Parkinson's disease have recognizable features in their speech signals which can be used to assess the disease. Additionally, multitude of feature sets and classification models have been applied in the studies.
In the research of this work, for feature extraction MFCCs and eGeMAPS features are used to extract useful information from audio signals. The features work as an input to three different machine learning models used in this study: support vector machine, random forest, and convolutional neural network. These machine learning models are used to identify Parkinson's disease from the monologues of PC-GITA corpus. The data from PC-GITA used in this study consists of around a minute long spontaneous speeches from a hundred people of healthy speaker and people with diagnosed Parkinson’s disease.
The results of this work were evaluated with a speaker-independent cross-validation method, in which each speaker acts as test data for the machine learning model and the remaining speakers as the training data. The final accuracy of the model was obtained by calculating the average accuracy of all folds of one hundred speakers.
The results of this work indicate that Parkinson's disease can be recognized from speech using machine learning methods. Convolutional neural network produced the best accuracy for MFCCs features with 67.40% classification accuracy (Parkinson’s patient versus healthy talker), while random forest produced 75.00% accuracy for eGeMAPS features. The low accuracies are explained by the complexity of spontaneous speech and the chosen machine learning methods.
The literature research part of this work presents the stages of a typical paralinguistic speech processing pipeline and evaluates related studies and research. Based on the related studies, people with Parkinson's disease have recognizable features in their speech signals which can be used to assess the disease. Additionally, multitude of feature sets and classification models have been applied in the studies.
In the research of this work, for feature extraction MFCCs and eGeMAPS features are used to extract useful information from audio signals. The features work as an input to three different machine learning models used in this study: support vector machine, random forest, and convolutional neural network. These machine learning models are used to identify Parkinson's disease from the monologues of PC-GITA corpus. The data from PC-GITA used in this study consists of around a minute long spontaneous speeches from a hundred people of healthy speaker and people with diagnosed Parkinson’s disease.
The results of this work were evaluated with a speaker-independent cross-validation method, in which each speaker acts as test data for the machine learning model and the remaining speakers as the training data. The final accuracy of the model was obtained by calculating the average accuracy of all folds of one hundred speakers.
The results of this work indicate that Parkinson's disease can be recognized from speech using machine learning methods. Convolutional neural network produced the best accuracy for MFCCs features with 67.40% classification accuracy (Parkinson’s patient versus healthy talker), while random forest produced 75.00% accuracy for eGeMAPS features. The low accuracies are explained by the complexity of spontaneous speech and the chosen machine learning methods.
Kokoelmat
- Kandidaatintutkielmat [8800]