Guided Captioning of Audio
Martin Morato, Irene; Afolaranmi, James; Mesaros, Annamaria (2024)
URI
https://dcase.community/documents/workshop2024/proceedings/DCASE2024Workshop_Martin-Morato_26.pdfMartin Morato, Irene
Afolaranmi, James
Mesaros, Annamaria
2024
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202501311852
https://urn.fi/URN:NBN:fi:tuni-202501311852
Kuvaus
Peer reviewed
Tiivistelmä
This work introduces a guided captioning system that aims to produce captions focused on different audio content, depending on a guiding text. We show that using keywords guidance results in more diverse captions, even though the usual captioning metrics do not reflect this. We design a system that can be trained using keywords automatically extracted from reference annotations, and which is provided with one keyword at test time. When trained with 5 keywords, the produced captions contain the exact guidance keyword 70% of the time, and results in over 3600 unique sentences for Clotho dataset. In contrast, a baseline without any keywords produces 700 unique captions on the same test set.
Kokoelmat
- TUNICRIS-julkaisut [22382]