Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair
Wang, Shanshan; Naithani, Gaurav; Politis, Archontis; Virtanen, Tuomas (2021)
Wang, Shanshan
Naithani, Gaurav
Politis, Archontis
Virtanen, Tuomas
IEEE
2021
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202203032316
https://urn.fi/URN:NBN:fi:tuni-202203032316
Kuvaus
Peer reviewed
Tiivistelmä
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.
Kokoelmat
- TUNICRIS-julkaisut [18603]