Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames
Nami, Sanaz; Pakdaman, Farhad; Hashemi, Mahmoud Reza; Shirmohammadi, Shervin; Gabbouj, Moncef (2024)
Nami, Sanaz
Pakdaman, Farhad
Hashemi, Mahmoud Reza
Shirmohammadi, Shervin
Gabbouj, Moncef
2024
IEEE Transactions on Circuits and Systems for Video Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202405075489
https://urn.fi/URN:NBN:fi:tuni-202405075489
Kuvaus
Peer reviewed
Tiivistelmä
<p>The Just Noticeable Difference (JND) refers to the smallest distortion in an image or video that can be perceived by Human Visual System (HVS), and is widely used in optimizing image/video compression. However, accurate JND modeling is very challenging due to its content dependence, and the complex nature of the HVS. Recent solutions train deep learning based JND prediction models, mainly based on a Quantization Parameter (QP) value, representing a single JND level, and train separate models to predict each JND level. We point out that a single QP-distance is insufficient to properly train a network with millions of parameters, for a complex content-dependent task. Inspired by recent advances in learned compression and multitask learning, we propose to address this problem by (1) learning to reconstruct the JND-quality frames, jointly with the QP prediction, and (2) jointly learning several JND levels to augment the learning performance. We propose a novel solution where first, an effective feature backbone is trained by learning to reconstruct JND-quality frames from the raw frames. Second, JND prediction models are trained based on features extracted from latent space (i.e., compressed domain), or reconstructed JND-quality frames. Third, a multi-JND model is designed, which jointly learns three JND levels, further reducing the prediction error. Extensive experimental results demonstrate that our multi-JND method outperforms the state-of-the-art and achieves an average JND<sub>1</sub> prediction error of only 1.57 in QP, and 0.72 dB in PSNR. Moreover, the multitask learning approach, and compressed domain prediction facilitate light-weight inference by significantly reducing the complexity and the number of parameters.</p>
Kokoelmat
- TUNICRIS-julkaisut [20143]