Enhanced Data-Recalibration: Utilizing Validation Data to Mitigate Instance-Dependent Noise in Classification
Bakhshi Germi, Saeed; Rahtu, Esa (2022-05-15)
Bakhshi Germi, Saeed
Rahtu, Esa
15.05.2022
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202208226616
https://urn.fi/URN:NBN:fi:tuni-202208226616
Kuvaus
Peer reviewed
Tiivistelmä
This paper proposes a practical approach to deal with instance-dependent noise in classification. Supervised learning with noisy labels is one of the major research topics in the deep learning community. While old works typically assume class conditional and instance-independent noise, recent works provide theoretical and empirical proof to show that the noise in real-world cases is instance-dependent. Current state-of-the-art methods for dealing with instance-dependent noise focus on data-recalibrating strategies to iteratively correct labels while training the network. While some methods provide theoretical analysis to prove that each iteration results in a cleaner dataset and a better-performing network, the limiting assumptions and dependency on knowledge about noise for hyperparameter tuning often contrast their claims. The proposed method in this paper is a two-stage data-recalibration algorithm that utilizes validation data to correct noisy labels and refine the model iteratively. The algorithm works by training the network on the latest cleansed training Set to obtain better performance on a small, clean validation set while using the best performing model to cleanse the training set for the next iteration. The intuition behind the method is that a network with decent performance on the clean validation set can be utilized as an oracle network to generate less noisy labels for the training set. While there is no theoretical guarantee attached, the method’s effectiveness is demonstrated with extensive experiments on synthetic and real-world benchmark datasets. The empirical evaluation suggests that the proposed method has a better performance compared to the current state-of-the-art works. The implementation is available at https://github.com/Sbakhshigermi/EDR.
Kokoelmat
- TUNICRIS-julkaisut [20161]