Using sequential information in polyphonic sound event detection
Huang, Guangpu; Heittola, Toni; Virtanen, Tuomas (2018-11-02)
Huang, Guangpu
Heittola, Toni
Virtanen, Tuomas
02.11.2018
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202003042529
https://urn.fi/URN:NBN:fi:tuni-202003042529
Kuvaus
Peer reviewed
Tiivistelmä
<p>To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.</p>
Kokoelmat
- TUNICRIS-julkaisut [20689]