Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair
Wang, Shanshan; Naithani, Gaurav; Politis, Archontis; Virtanen, Tuomas (2021)
Wang, Shanshan
Naithani, Gaurav
Politis, Archontis
Virtanen, Tuomas
2021
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202203032316
https://urn.fi/URN:NBN:fi:tuni-202203032316
Kuvaus
Peer reviewed
Tiivistelmä
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.
Kokoelmat
- TUNICRIS-julkaisut [20739]