High-Level Synthesis Implementation of HEVC Motion Estimation on FPGA
Miettinen, Jere (2020)
Miettinen, Jere
2020
Sähkötekniikan DI-tutkinto-ohjelma - Degree Programme in Electrical Engineering, MSc (Tech)
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2020-05-22
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202004273939
https://urn.fi/URN:NBN:fi:tuni-202004273939
Tiivistelmä
The need for transmitting high quality videos fast and effectively has increased in the recent years. Main reason for that is the increase in resolutions and frame rates, and the growing use of mobile devices and streaming. High Efficiency Video Coding (HEVC) is the latest video coding standard designed to respond those needs. HEVC achieves better compression compared to previous standards without compromising the video quality.
High-Level Synthesis (HLS) tools bring automation to the complex design processes and the designer can focus more on the algorithm functionality. The HLS design flow is on a higher abstraction layer compared to the traditional hardware design flows. Programming language such as C can be used instead of one of the hardware description languages (HDL) such as VHDL or Verilog. HLS was chosen for this Thesis, instead of traditional register transfer level (RTL) design, for faster and easier development.
Kvazaar is an open source HEVC video encoder developed at Tampere University. The encoding is done by removing temporal or spatial data redundancy. Motion estimation (ME) aims to reduce the temporal data redundancy. ME can be done using one of the various block matching algorithms (BMA), such as full search (FS) or hexagon-based search (HEXBS). The main goal of this Thesis was to evaluate Kvazaar’s ME algorithms and then implement a ME accelerator on hardware using HLS. The accelerator was aimed for a Field Programmable Gate Array (FPGA) circuit.
ME is one of the most complex parts of the encoding and takes a significant amount of time of the whole encoding process and it is a good candidate for HW acceleration. FS algorithm was chosen for hardware acceleration in this Thesis and the hardware implementation was done using Catapult-C HLS tool. The accelerated algorithm was synthesized to Arria 10 FPGA platform and integrated as a part of Kvazaar’s encoding process.
The synthesized Accelerator works on 150 MHz frequency and takes 18% of the available logic resources on Arria 10. It uses 6% of the available M20K memory elements and 6% of the platform’s registers. The Accelerator achieved ×66.26 speedup compared to the software only algorithm. Once integrated to the Kvazaar’s encoding process the speedup was still ×1.94. The drop in the speedup can be explained with the throughput limitations of the PCIe bus used in the communication between Kvazaar and the Arria 10 platform.
High-Level Synthesis (HLS) tools bring automation to the complex design processes and the designer can focus more on the algorithm functionality. The HLS design flow is on a higher abstraction layer compared to the traditional hardware design flows. Programming language such as C can be used instead of one of the hardware description languages (HDL) such as VHDL or Verilog. HLS was chosen for this Thesis, instead of traditional register transfer level (RTL) design, for faster and easier development.
Kvazaar is an open source HEVC video encoder developed at Tampere University. The encoding is done by removing temporal or spatial data redundancy. Motion estimation (ME) aims to reduce the temporal data redundancy. ME can be done using one of the various block matching algorithms (BMA), such as full search (FS) or hexagon-based search (HEXBS). The main goal of this Thesis was to evaluate Kvazaar’s ME algorithms and then implement a ME accelerator on hardware using HLS. The accelerator was aimed for a Field Programmable Gate Array (FPGA) circuit.
ME is one of the most complex parts of the encoding and takes a significant amount of time of the whole encoding process and it is a good candidate for HW acceleration. FS algorithm was chosen for hardware acceleration in this Thesis and the hardware implementation was done using Catapult-C HLS tool. The accelerated algorithm was synthesized to Arria 10 FPGA platform and integrated as a part of Kvazaar’s encoding process.
The synthesized Accelerator works on 150 MHz frequency and takes 18% of the available logic resources on Arria 10. It uses 6% of the available M20K memory elements and 6% of the platform’s registers. The Accelerator achieved ×66.26 speedup compared to the software only algorithm. Once integrated to the Kvazaar’s encoding process the speedup was still ×1.94. The drop in the speedup can be explained with the throughput limitations of the PCIe bus used in the communication between Kvazaar and the Arria 10 platform.