Mobile Phone Speaker Audio Quality Classification with Convolutional Neural Networks
Sipiä, Laura (2021)
Sipiä, Laura
2021
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-05-24
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202105034314
https://urn.fi/URN:NBN:fi:tuni-202105034314
Tiivistelmä
There are many processes that have been automated which previously would have been done by a human. The process of quality evaluation is one of those processes. Machine learning is one of the most trending techniques of current time. Solutions using machine learning are used in various application areas e.g. object detection and image classification. Research and development has been done regarding quality evaluation using machine learning based solutions. Despite the huge interest in machine learning, there are very few existing solutions for audio quality classification with machine learning so far.
This thesis examines the possibility of using machine learning based solution for audio quality classification of mobile phone speakers. The thesis is executed as a comparison study between three models that have different design principles. The overall performance of each model is evaluated using 5-fold cross-validation. The generalization ability of each model is evaluated with leave-one-group-out cross-validation. The models are compared to each other based on the training metrics (loss and accuracy), time spent on training and the averaged performance in the two forementioned cross-validation methods.
All three classifiers produced good results, exceeding expectations. Two of the models in the comparison study attained over 0.98 AUC value and the third one attained over 0.95 AUC value. The generalization ability was measured by comparing the performance metrics, F2-score and AUC value of the 5-fold cross-validation and the leave-one-group-out cross-validation. With the best performing model, the difference in the results were less than 0.005 for the F2-score and less than 0.001 for the AUC value. This means that regardless of the phone model, the classification model could classify audio samples with nearly equal performance. The results gained from this research show that machine learning based model can be used for mobile phone speaker audio quality classification.
This thesis examines the possibility of using machine learning based solution for audio quality classification of mobile phone speakers. The thesis is executed as a comparison study between three models that have different design principles. The overall performance of each model is evaluated using 5-fold cross-validation. The generalization ability of each model is evaluated with leave-one-group-out cross-validation. The models are compared to each other based on the training metrics (loss and accuracy), time spent on training and the averaged performance in the two forementioned cross-validation methods.
All three classifiers produced good results, exceeding expectations. Two of the models in the comparison study attained over 0.98 AUC value and the third one attained over 0.95 AUC value. The generalization ability was measured by comparing the performance metrics, F2-score and AUC value of the 5-fold cross-validation and the leave-one-group-out cross-validation. With the best performing model, the difference in the results were less than 0.005 for the F2-score and less than 0.001 for the AUC value. This means that regardless of the phone model, the classification model could classify audio samples with nearly equal performance. The results gained from this research show that machine learning based model can be used for mobile phone speaker audio quality classification.