Design and Implementation of Configurable Motion Estimation Architecture for Video Encoding
Vanne, Jarno Johannes (2011)
Vanne, Jarno Johannes
Tampere University of Technology
2011
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-2011102714860
https://urn.fi/URN:NBN:fi:tty-2011102714860
Tiivistelmä
A key factor behind the success of video products and services is video compression that makes digital video practical in the current communication networks and storage devices. However, video compression involves complex encoding algorithms whose real-time execution is particularly challenging in portable devices due to strict constraints on cost, size, and power consumption. In addition, modern devices are expected to be flexible enough to support plethora of video coding standards.
This Thesis focuses on motion estimation (ME) that is the main coding tool for removing temporal redundancy of video scenes and it typically accounts for 50 - 90% of the video encoding complexity. Due to these two reasons, hundreds of algorithms and architectures have been proposed for non-standardized ME. However, most of the current ME architectures are limited to a single video coding standard or ME algorithm, whereas existing flexible architectures are realized with large silicon area, limited processing speed, unsustainable power budget, or over restricted ME parameters.
The main contribution of this Thesis is a novel ME architecture that overcomes all the crucial limitations faced by the contemporary approaches. The designed and implemented hardware architecture is compatible with all inter coding modes of H.261/3, MPEG-1/2, MPEG-4 Visual, H.264/AVC, and VC-1 standards. In addition, it can perform rate-constrained integer ME (IME) with various fast ME algorithms such as BBGDS, CDS, DS, HEXBS, and TSS. It also reduces the complexity of the subsequent fractional ME (FME) by conducting mode decision jointly with IME. The flexibility of the architecture is based on a parametrizable search strategy control and associated separable search path generation through which different ME algorithms and inter coding modes are easily implemented and efficiently executed. The architecture is composed of the three accurately optimized and seamlessly coupled components: a control unit, a memory system, and distortion computation unit.
The results illustrate that the architecture can process real-time (30 fps) single reference frame ME up to 1080p format (1920 x 1080 pixels) with H.261/3, MPEG-1/2, MPEG-4 Visual, and VC-1 standards. When processing CIF, D1, and 1080p formats at 30 fps, the architecture consumes only 22.3 - 25.1 kgates, 20.5 KB of memory with 123 x 123 pixel search range, and 3 - 184 mW of power with a 0.13-micrometer CMOS standard cell technology. Supporting ME for H.264/AVC 1080p video at 30 fps requires a duplicated architecture whose respective metrics are 49.7 kgates, 41.0 KB of memory, and 364 mW of power. The performance comparison shows that the designed architecture consumes 39 - 89% less logic gates than the existing approaches.
This Thesis focuses on motion estimation (ME) that is the main coding tool for removing temporal redundancy of video scenes and it typically accounts for 50 - 90% of the video encoding complexity. Due to these two reasons, hundreds of algorithms and architectures have been proposed for non-standardized ME. However, most of the current ME architectures are limited to a single video coding standard or ME algorithm, whereas existing flexible architectures are realized with large silicon area, limited processing speed, unsustainable power budget, or over restricted ME parameters.
The main contribution of this Thesis is a novel ME architecture that overcomes all the crucial limitations faced by the contemporary approaches. The designed and implemented hardware architecture is compatible with all inter coding modes of H.261/3, MPEG-1/2, MPEG-4 Visual, H.264/AVC, and VC-1 standards. In addition, it can perform rate-constrained integer ME (IME) with various fast ME algorithms such as BBGDS, CDS, DS, HEXBS, and TSS. It also reduces the complexity of the subsequent fractional ME (FME) by conducting mode decision jointly with IME. The flexibility of the architecture is based on a parametrizable search strategy control and associated separable search path generation through which different ME algorithms and inter coding modes are easily implemented and efficiently executed. The architecture is composed of the three accurately optimized and seamlessly coupled components: a control unit, a memory system, and distortion computation unit.
The results illustrate that the architecture can process real-time (30 fps) single reference frame ME up to 1080p format (1920 x 1080 pixels) with H.261/3, MPEG-1/2, MPEG-4 Visual, and VC-1 standards. When processing CIF, D1, and 1080p formats at 30 fps, the architecture consumes only 22.3 - 25.1 kgates, 20.5 KB of memory with 123 x 123 pixel search range, and 3 - 184 mW of power with a 0.13-micrometer CMOS standard cell technology. Supporting ME for H.264/AVC 1080p video at 30 fps requires a duplicated architecture whose respective metrics are 49.7 kgates, 41.0 KB of memory, and 364 mW of power. The performance comparison shows that the designed architecture consumes 39 - 89% less logic gates than the existing approaches.
Kokoelmat
- Väitöskirjat [4864]