Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding
Nikara, J. (2004)
Nikara, J.
Tampere University of Technology
2004
Tietotekniikan osasto - Department of Information Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-200810021025
https://urn.fi/URN:NBN:fi:tty-200810021025
Tiivistelmä
This Thesis considers the design of application-specific parallel structures for digital signal processing. Due to wideness of the subject, the discussion has been restricted to the studies of the discrete cosine transform and variable length decoding.
New area-efficient parallel structures, which process data in a sequential form at data rate, are developed for the discrete cosine transform. The development of the structures begins with the derivation of novel regular fast algorithms. The algorithms lend themselves for vertical mapping resulting in modular cascaded structures that can be freely pipelined due to the loop-free structure. In order to prove the feasibility and estimate the performance, the unified transform kernel for discrete cosine transform and its inverse is implemented on a standard cell CMOS technology with a data path synthesis. Finally, the comparison to a state-of-the-art design reveals up to 15% smaller estimated area than in the reference design.
For the variable length decoding, a novel multiple-symbol decoding scheme is proposed. The critical path of the resulting decoder is minimized by introducing a new multiplexed add unit. In order to prove the feasibility and determine the limiting factors of the scheme, the decoder has been implemented on an FPGA technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second. Although, a straightforward and fair comparison of variable length decoders is extremely difficult due to different implementation approaches, the performance of the decoder can be considered promising with 16-100% better throughput at 2-3.6 times lower frequencies than the reference designs on the same FPGA technology.
In both the case studies, the discrete cosine transform and variable length decoding, the modularity and achievable high speed operation provide flexibility for the design re-use in the current and future applications.
New area-efficient parallel structures, which process data in a sequential form at data rate, are developed for the discrete cosine transform. The development of the structures begins with the derivation of novel regular fast algorithms. The algorithms lend themselves for vertical mapping resulting in modular cascaded structures that can be freely pipelined due to the loop-free structure. In order to prove the feasibility and estimate the performance, the unified transform kernel for discrete cosine transform and its inverse is implemented on a standard cell CMOS technology with a data path synthesis. Finally, the comparison to a state-of-the-art design reveals up to 15% smaller estimated area than in the reference design.
For the variable length decoding, a novel multiple-symbol decoding scheme is proposed. The critical path of the resulting decoder is minimized by introducing a new multiplexed add unit. In order to prove the feasibility and determine the limiting factors of the scheme, the decoder has been implemented on an FPGA technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second. Although, a straightforward and fair comparison of variable length decoders is extremely difficult due to different implementation approaches, the performance of the decoder can be considered promising with 16-100% better throughput at 2-3.6 times lower frequencies than the reference designs on the same FPGA technology.
In both the case studies, the discrete cosine transform and variable length decoding, the modularity and achievable high speed operation provide flexibility for the design re-use in the current and future applications.
Kokoelmat
- Väitöskirjat [4843]