Does Paid Crowdsourcing Still Pay Off? Sifting Through Annotator Noise in Crowdsourced Audio Labels
Harju, Manu; Martin Morato, Irene; Mesaros, Annamaria (2024-10)
Harju, Manu
Martin Morato, Irene
Mesaros, Annamaria
10 / 2024
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202501311845
https://urn.fi/URN:NBN:fi:tuni-202501311845
Kuvaus
Peer reviewed
Tiivistelmä
Paid crowdsourcing has emerged as a popular method for annotating diverse data types such as images, text, and audio. However, the amount of carelessly working annotators has increased as platforms have become more popular, leading to an influx of spam workers that answer at random, which renders the platforms unusable. This paper documents our attempt to annotate the DESED dataset using Amazon’s Mechanical Turk, and failing to obtain any useful data after two attempts. Our observations reveal that while the number of workers performing the tasks has increased since 2021, the quality of obtained labels has declined considerably. After successful trials for annotating audio data in 2021 and 2022, in 2024 the same user interface annotation setup predominantly attracted spammers. Given the consistent task setup and similarity to previous attempts, it remains unclear whether the workers are inherently subpar or if they are intentionally exploiting the platform. The bottom line is that despite spending a considerable amount of money on it, we obtained no usable data.
Kokoelmat
- TUNICRIS-julkaisut [19294]