Parallel memory architectures for video coding
Tanskanen, J. (2004)
Tanskanen, J.
Tampere University of Technology
2004
Tietotekniikan osasto - Department of Information Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-200810021086
https://urn.fi/URN:NBN:fi:tty-200810021086
Tiivistelmä
Most of the current processor architectures have word-addressable internal memories and wide data paths that are efficiently utilized whenever data is aligned according to word locations. However, in video coding the operands are typically 8- to 16-bits, so the architecture would be inefficiently exploited. Common solutions are to modify the data path so that multiple subwords can be processed in parallel and to provide dedicated instructions for data alignment. However, internal parallel memory architectures with versatile memory access properties have not been widely used.
This thesis provides new insight into the design of internal on-chip data memory architectures for standards based video compression. The results can be employed in both programmable and hardware oriented solutions. This work shows that internal parallel data memory can be a viable design choice for the subword parallel and SIMD parallel processors performing video coding applications.
Based on the analysis of the key functions of a video codec, a conventional word addressable architecture needs an average cycle count higher by a factor of 1.44-1.98 and an average instruction count higher by a factor of 1.22-1.62 than the proposed parallel memory. With a modulo addressable parallel memory, the external memory bandwidth can be decreased by about a factor of 1.6, while preserving efficient memory access performance in block matching operations.
Enhanced memory access benefits of the parallel memory are application specific and need to be judged against the complexity of the design task. When compared to a conventional word-addressable memory, the parallel memories studied required larger silicon area (1.14-1.93), had higher power consumption per memory access (1.30-2.77), and longer total memory access delay (1.16-2.36).
The results improve the understanding of design trade-offs related to video codecs. Furthermore, they provide the implementors data on the gate counts, area, power consumption, cycle times, and other performance figures of parallel memory solutions.
This thesis provides new insight into the design of internal on-chip data memory architectures for standards based video compression. The results can be employed in both programmable and hardware oriented solutions. This work shows that internal parallel data memory can be a viable design choice for the subword parallel and SIMD parallel processors performing video coding applications.
Based on the analysis of the key functions of a video codec, a conventional word addressable architecture needs an average cycle count higher by a factor of 1.44-1.98 and an average instruction count higher by a factor of 1.22-1.62 than the proposed parallel memory. With a modulo addressable parallel memory, the external memory bandwidth can be decreased by about a factor of 1.6, while preserving efficient memory access performance in block matching operations.
Enhanced memory access benefits of the parallel memory are application specific and need to be judged against the complexity of the design task. When compared to a conventional word-addressable memory, the parallel memories studied required larger silicon area (1.14-1.93), had higher power consumption per memory access (1.30-2.77), and longer total memory access delay (1.16-2.36).
The results improve the understanding of design trade-offs related to video codecs. Furthermore, they provide the implementors data on the gate counts, area, power consumption, cycle times, and other performance figures of parallel memory solutions.
Kokoelmat
- Väitöskirjat [4862]