Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

Drgas, Szymon; Virtanen, Tuomas (2021-11)

 
Avaa tiedosto
Joint_speaker_separation_and_recognition.pdf (577.7Kt)
Lataukset: 



Drgas, Szymon
Virtanen, Tuomas
11 / 2021

Computer Speech and Language
101223
doi:10.1016/j.csl.2021.101223
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202106296094

Kuvaus

Peer reviewed
Tiivistelmä
<p>In this article, we propose a new method for joint cochannel speaker separation and recognition called adaptive-dictionary non-negative matrix deconvolution (DANMD). This method is an extension of non-negative matrix deconvolution (NMD) which models spectrogram matrix as a linear combination of dictionary elements (atoms). We propose a dictionary which is a linear combination of speaker-independent component and components representing speaker variability. The dictionary is parametric and all atoms depend on a small number of parameters. The speaker-independent component and components representing speaker variability are learned from recordings of tens or hundreds of speakers. We show that the proposed method can be applied to the single-channel speech separation task where two speakers of unknown identity are to be separated. In a scenario where the unknown speakers’ recordings are in training dataset together with recordings of many other speakers, we show that the proposed method outperforms stacked NMD (NMD with a dictionary which contains atoms of all speakers in the dataset) in terms of signal-to-distortion ratio (SDR). DANMD was also tested in a scenario where recordings of the recognized speakers were not in the training dataset. In this case it brought clearly positive signal-to-distortion ratios. The proposed model was also tested for a co-channel speaker identification task, where the parameters of the adapted model are a basis for a decision about the identity of the speakers in the mixture. In this case, the accuracy was 81.2 in comparison to 84.1 in the case of stacked NMD. While the speaker recognition accuracy is lower for the new approach, we find the primary value in the improved SDR.</p>
Kokoelmat
  • TUNICRIS-julkaisut [20536]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste