Model compression methods for convolutional neural networks
Lunnikivi, Henri (2019)
Lunnikivi, Henri
2019
Tietotekniikan DI-ohjelma - Degree Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-11-26
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-201911226216
https://urn.fi/URN:NBN:fi:tuni-201911226216
Tiivistelmä
Deep learning has been found to be an effective solution to many problems in the
field of computer vision. Convolutional networks have been a particularly successful
model for computer vision. Convolutional neural networks extract feature maps
from an image, then use the feature maps to determine to which of the preset
categories the image belongs. Convolutional neural networks can be trained on a
powerful machine, and then deployed onto a target device for inference. Computing
inference has become feasible on mobile phones and IoT edge devices. However, these
devices come with constraints like reduced processing resources, smaller memory
caches, decreased memory bandwidth. To make computing inference practical on
these devices, the effectiveness of various model compression methods is evaluated
quantitatively. Methods are evaluated by applying them on a simple convolutional
neural network for optical vehicle classification. Convolutional layers are separated
into component vectors for a reduction in inference time on CPU, GPU, and an
embedded target. Fully connected layers are pruned and retuned in combination
with regularization and dropout. Pruned layers are compressed using a sparse matrix
format. All optimizations are tested on three platforms with varying capabilities.
Separation of convolutional layers improves latency of the whole model by 3.00x
on a CPU platform. Using a sparse format on a pruned model with a large fully
connected layer improves latency of the whole model by 2.01x on desktop with a
GPU and by 1.82x on the embedded platform. On average, pruning the model
allows 39.1x reduction in total model size while causing a 1.67 %-point reduction
in accuracy, when dropout is used to control overfitting. This allows for a vehicle
classifier to fit in 180 kB of memory with reasonable reduction in accuracy.
field of computer vision. Convolutional networks have been a particularly successful
model for computer vision. Convolutional neural networks extract feature maps
from an image, then use the feature maps to determine to which of the preset
categories the image belongs. Convolutional neural networks can be trained on a
powerful machine, and then deployed onto a target device for inference. Computing
inference has become feasible on mobile phones and IoT edge devices. However, these
devices come with constraints like reduced processing resources, smaller memory
caches, decreased memory bandwidth. To make computing inference practical on
these devices, the effectiveness of various model compression methods is evaluated
quantitatively. Methods are evaluated by applying them on a simple convolutional
neural network for optical vehicle classification. Convolutional layers are separated
into component vectors for a reduction in inference time on CPU, GPU, and an
embedded target. Fully connected layers are pruned and retuned in combination
with regularization and dropout. Pruned layers are compressed using a sparse matrix
format. All optimizations are tested on three platforms with varying capabilities.
Separation of convolutional layers improves latency of the whole model by 3.00x
on a CPU platform. Using a sparse format on a pruned model with a large fully
connected layer improves latency of the whole model by 2.01x on desktop with a
GPU and by 1.82x on the embedded platform. On average, pruning the model
allows 39.1x reduction in total model size while causing a 1.67 %-point reduction
in accuracy, when dropout is used to control overfitting. This allows for a vehicle
classifier to fit in 180 kB of memory with reasonable reduction in accuracy.