Quantization-aware pruning for a CNN-based radio receiver model
Henri, Kasurinen (2021)
Henri, Kasurinen
2021
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-07-27
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202106155910
https://urn.fi/URN:NBN:fi:tuni-202106155910
Tiivistelmä
Machine Learning (ML) has become a vital part of our world as Convolutional Neural Networks (CNN) enabled super-human performance in a multitude of tasks. The downside is that the computational complexity and memory foot print of CNNs has increased along with their performance to billions of operations, making it impossible to deploy CNNs on resource-restricted hardware. Solutions for this problem can be found in CNN compression methods like CNN pruning and quantization. CNN quantization replaces parts of the original network with approximations, which trade-off accuracy for smaller computation size. CNN pruning on the other hand, removes parts of the network to reduce the computational complexity.
This thesis studies the compression of a CNN-based radio receiver called Deep Learning Receiver (DeepRx), which faces the fore-mentioned issues of a high computational complexity and a resource-restricted working environment. We reviewed the literature on CNN compression and decided to focus on the combination of pruning and quanitzation. Based on this research we developed a quantization-aware pruning algorithm called Quantization-Aware Multi-Stage Pruning (QAMP). Our method achieved a 97,25% pruning ratio with the DeepRx receiver with only a small drop in accuracy. It has to be said, that the DeepRx model was initially over-parametrized, which leads to the huge percent-wise drop in size. Our results, however, are still impressive and encourage us for further development.
This thesis studies the compression of a CNN-based radio receiver called Deep Learning Receiver (DeepRx), which faces the fore-mentioned issues of a high computational complexity and a resource-restricted working environment. We reviewed the literature on CNN compression and decided to focus on the combination of pruning and quanitzation. Based on this research we developed a quantization-aware pruning algorithm called Quantization-Aware Multi-Stage Pruning (QAMP). Our method achieved a 97,25% pruning ratio with the DeepRx receiver with only a small drop in accuracy. It has to be said, that the DeepRx model was initially over-parametrized, which leads to the huge percent-wise drop in size. Our results, however, are still impressive and encourage us for further development.