Learning vocal mode classifiers from heterogeneous data sources
Zhao, Shuyang; Heittola, Toni; Virtanen, Tuomas (2017)
Zhao, Shuyang
Heittola, Toni
Virtanen, Tuomas
IEEE Computer Society
2017
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201802081197
https://urn.fi/URN:NBN:fi:tty-201802081197
Kuvaus
Peer reviewed
Tiivistelmä
This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. However, previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering the cases that training and testing data are recorded in mismatched condition. Experiments revealed a big difference between homogeneous recognition scenario and heterogeneous recognition scenario, using a new dataset TUT-vocal-2016. In the homogeneous recognition scenario, the classification accuracy using cross-validation on TUT-vocal-2016 was 95.5%. In heterogeneous recognition scenario, seven existing datasets were used as training material and TUT-vocal-2016 was used for testing, the classification accuracy was only 69.6%. Several feature normalization methods were tested to improve the performance in heterogeneous recognition scenario. The best performance (96.8%) was obtained using the proposed subdataset-wise normalization.
Kokoelmat
- TUNICRIS-julkaisut [19288]