Analysis of fixed-point and floating-point quantization in fast Fourier transform
Ghalib, Ataul (2013)
Ghalib, Ataul
2013
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2013-06-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201306171263
https://urn.fi/URN:NBN:fi:tty-201306171263
Tiivistelmä
Digital Signal Processors (DSPs) can be categorized on the basis of number system used for the arithmetic calculations. Either they can be fixed-point or floating-point processors. Some processors also support both fixed-point and floating-point number systems. Performing signal quantization is very necessary step in digital signal processing, either it is applied on the input signal to convert from analogue to digital domain or on the intermediate digital signal to keep it in the representable range to avoid overflow, it always introduces some error hence reducing the signal-to-noise ratio. Both number systems do not introduce the same amount of error when quantization is applied. So when implementing some DSP algorithm e.g. fast Fourier transform on a DSP processor, quantization analysis need to be performed for fixed-point and floating-point number system in order to have an optimized SNR.
In this thesis, we have presented such quantization analysis on double precision floating-point FFT model and optimized fixed-point and floating-point quantization for reference FFT model in order to generate the same SNR. For this purpose fixed-point and floating-point quantization models are generated and placed in the reference FFT model and experiments are performed with randomly generated complex stimulus. Results have shown that generally floating-point quantized FFT model shows better SNR results than fixed-point quantized FFT model, but at smaller number of exponent bits and higher number fractional bits floating-point and fixed-point results are almost the same.
In this thesis, we have presented such quantization analysis on double precision floating-point FFT model and optimized fixed-point and floating-point quantization for reference FFT model in order to generate the same SNR. For this purpose fixed-point and floating-point quantization models are generated and placed in the reference FFT model and experiments are performed with randomly generated complex stimulus. Results have shown that generally floating-point quantized FFT model shows better SNR results than fixed-point quantized FFT model, but at smaller number of exponent bits and higher number fractional bits floating-point and fixed-point results are almost the same.