Adaptation of Pre-trained Deep Neural Networks for Sound Event Detection Facilitating Smart Homecare
Shin, Mikyong Deborah (2023)
Shin, Mikyong Deborah
2023
Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-06-07
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202305316375
https://urn.fi/URN:NBN:fi:tuni-202305316375
Tiivistelmä
As foreseen by numerous researchers, the worldwide demographic changes of the elderly population in 2050 will be expected to grow by over 30% in the global population, which has urged to development of cost-efficient and effective automated sound recognition systems to assist the well-being of the self-living older people in their homecare environment. Consequently, in recent research on sound event classification and detection systems, there has been increasing research on adapting the pre-trained model YAMNet because it can classify 521 sound event classes trained with a large-scale AudioSet dataset. Despite the huge potential, the main problem of using the YAMNet predictions was observed in our early investigation difficulty in finding associated YAMNet classes for the target events predefined in public benchmark acoustic datasets. This study aimed to investigate this class mapping complication to adapt the YAMNet pre-trained model into a sound event detection system with temporal information for monitoring abnormalities in residential homecare environments. A new Y-MCC methodology was developed based on the Matthews correlation coefficient (MCC) to resolve the original YAMNet class map and produce new class maps according to the MCC thresholds. The performance of the Y-MCC system successfully demonstrated the SED system feasibility by achieving the best F1 score of 59.46% in the overall micro-average on the SINS dataset and class-wise F1-score performance of ‘sheep’ at 100% and ‘brushing teeth’ at 96.8% in ESC-50 and ‘vacuum cleaner’ at 94.7% in SINS, and ‘water tap running’ at 58.5% in TUT-SED 2016 Home datasets. This indicates the potential use of the Y-MCC method for facilitating automated sound event monitoring systems in smart homecare applications.