Discrete sine and cosine transforms on parallel processors
Pant, Nirajan (2015)
Pant, Nirajan
2015
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-08-12
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201508061507
https://urn.fi/URN:NBN:fi:tty-201508061507
Tiivistelmä
Starting point of this master thesis is Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) algorithms for signal processing. Based on the number system used in DCT and DST application, they can be categorized as fixed-point and floating-point DCT/DST. Floating-point numbers have large dynamic range to represent very large and small numbers. However, floating-point operation requires more clock cycles than fixed-point operation. Specialized hardware can be used for floating-point operations for high performance, but it also increases hardware cost. So, for general applications, use of fixed-point number system would be a good choice provided that an optimum accuracy is guaranteed.
In this thesis, the existing floating–point DCT and DST of type-1 C-codes are first converted into fixed-point code. The fractional fixed-point representation is used for the fixed-point conversion for maximum possible accuracy. The choice of Q15 format provides highest precision for signed 16-bit fixed-point number. But in this format, the range of numbers has to be normalized between [-1, 1]. The conversion process introduces some error in the output which is calculated by signal to noise ratio (SNR). After designing the fixed-point DCT/DST code, the performance is evaluated in various Tensilica processor configurations. The configurations provided are generated for Tensilica’s Diamond Standard Processor cores in Tensilica Xtensa Environment. The clock cycle counts of both fixed-point and floating-point DCT/DST code on four different configurations are recorded.
The results show that SNR of fixed-point DCT/DST is between (35-76dB) for different transform size of DCT/DST, which suggests that the fixed-point code is accurate enough. It is also observed that the fixed-point DCT/DST provides approximately 3 to 6 times performance improvement over floating-point code on Tensilica processors cores in terms of clock cycles. Furthermore, Tensilica’s Diamond Standard 570T parallel processor configuration provides the best performance among all configurations used for designed fixed-point code. Results have shown that the fixed-point DCT/DST code offers a large performance improvement over floating-point code provided that the floating-point code has no added hardware support.
In this thesis, the existing floating–point DCT and DST of type-1 C-codes are first converted into fixed-point code. The fractional fixed-point representation is used for the fixed-point conversion for maximum possible accuracy. The choice of Q15 format provides highest precision for signed 16-bit fixed-point number. But in this format, the range of numbers has to be normalized between [-1, 1]. The conversion process introduces some error in the output which is calculated by signal to noise ratio (SNR). After designing the fixed-point DCT/DST code, the performance is evaluated in various Tensilica processor configurations. The configurations provided are generated for Tensilica’s Diamond Standard Processor cores in Tensilica Xtensa Environment. The clock cycle counts of both fixed-point and floating-point DCT/DST code on four different configurations are recorded.
The results show that SNR of fixed-point DCT/DST is between (35-76dB) for different transform size of DCT/DST, which suggests that the fixed-point code is accurate enough. It is also observed that the fixed-point DCT/DST provides approximately 3 to 6 times performance improvement over floating-point code on Tensilica processors cores in terms of clock cycles. Furthermore, Tensilica’s Diamond Standard 570T parallel processor configuration provides the best performance among all configurations used for designed fixed-point code. Results have shown that the fixed-point DCT/DST code offers a large performance improvement over floating-point code provided that the floating-point code has no added hardware support.