Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

End-to-End Transformer for Compressed Video Quality Enhancement

Yu, Li; Chang, Wenshuai; Wu, Shiyu; Gabbouj, Moncef (2023-11-29)

 
Avaa tiedosto
End-to-End_Transformer_for_Compressed_Video_Quality_Enhancement.pdf (5.300Mt)
Lataukset: 



Yu, Li
Chang, Wenshuai
Wu, Shiyu
Gabbouj, Moncef
29.11.2023

IEEE Transactions on Broadcasting
doi:10.1109/TBC.2023.3332015
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023122011072

Kuvaus

Peer reviewed
Tiivistelmä
Convolutional neural networks have achieved excellent results in compressed video quality enhancement task in recent years. State-of-the-art methods explore the spatio-temporal information of adjacent frames mainly by deformable convolution. However, the CNN-based methods can only exploit local information, thus lacking the exploration of global information. Moreover, current methods enhance the video quality at a single scale, ignoring the multi-scale information, which corresponds to information at different receptive fields and is crucial for correlation modeling. Therefore, in this work, we propose a Transformer-based compressed video quality enhancement (TVQE) method, consisting of Transformer based Spatio-Temporal feature Fusion (TSTF) module and Multi-scale Channel-wise Attention based Quality Enhancement (MCQE) module. The proposed TSTF module learns both local and global features for correlation modeling, in which window-based Transformer and the encoder-decoder structure greatly improve the execution efficiency. The proposed MCQE module calculates the multi-scale channel attention, which aggregates the temporal information between channels in the feature map at multiple scales, achieving efficient fusion of inter-frame information. Extensive experiments on the JCT-VT test sequences show that the proposed method increases PSNR by up to 0.98 dB when QP = 37. Meanwhile, the inference speed is improved by up to 9.4%, and the number of Flops is reduced by up to 84.4% compared to competing methods at 720p resolution. Moreover, the proposed method achieves the BD-rate reduction up to 23.04%.
Kokoelmat
  • TUNICRIS-julkaisut [22385]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste