Tailored AVX2 Transform Kernels for Versatile Video Coding
Siivonen, Kari; Sainio, Joose; Mercat, Alexandre; Vanne, Jarno (2023-10-31)
Siivonen, Kari
Sainio, Joose
Mercat, Alexandre
Vanne, Jarno
31.10.2023
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023111710038
https://urn.fi/URN:NBN:fi:tuni-2023111710038
Kuvaus
Peer reviewed
Tiivistelmä
Transform coding tools play an integral part in video codecs due to their substantial impact on coding efficiency. The latest video coding standard, Versatile Video Coding (VVC), makes the most of these tools by introducing new DST7, DCT8, and non-square transforms alongside the conventional DCT2 transform. This paper proposes optimized AVX2 kernels for all these transforms to speed up VVC coding. Unlike existing solutions, our kernels are specially tailored for each VVC transform type and block size. Accelerating our open-source uvg266 VVC encoder with the proposed kernels yields up to a 1.1× speedup under all intra (AI) coding condition without any coding overhead. Our implementations make forward DCT2 and DST7/DCT8 transforms 4.0× and 6.7× as fast as their respective scalar implementations in the VTM reference encoder. They also outpace the AVX2 kernels of the practical VVenC encoder by factors of 3.0× and 2.8×. The respective speedups rise up to 5.3×, 11.1×, 3.4×, and 3.0× with inverse transforms.<br/>
Kokoelmat
- TUNICRIS-julkaisut [20263]