On Negative Sampling for Contrastive Audio-Text Retrieval
Xie, Huang; Räsänen, Okko; Virtanen, Tuomas (2023)
Xie, Huang
Räsänen, Okko
Virtanen, Tuomas
2023
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023121810986
https://urn.fi/URN:NBN:fi:tuni-2023121810986
Kuvaus
Peer reviewed
Tiivistelmä
<p>This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard negatives with cross-modality scores.</p>
Kokoelmat
- TUNICRIS-julkaisut [20250]