Utilizing deep learning on embedded devices
Oranen, Leevi (2021)
Oranen, Leevi
2021
Biotekniikan DI-ohjelma - Master's Programme in Bioengineering
Lääketieteen ja terveysteknologian tiedekunta - Faculty of Medicine and Health Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-09-02
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202108176575
https://urn.fi/URN:NBN:fi:tuni-202108176575
Tiivistelmä
The aim of this thesis was to review the tools needed for the development of deep learning application on an embedded system and how this can be done in practice. The most important and used frameworks (TensorFlow, Keras, PyTorch, Caffe and MATLAB) for building and training machine learning models were reviewed. Another aim was to study and compare the development experience and performance of four different single-board computers.
The machine learning tools studied were comparable in model training performance. Each tool has its strengths and weaknesses. Keras is easy to use and beginner friendly, but the customizability is limited. PyTorch, on the other hand, is very customizable, but requires more understanding about machine learning. TensorFlow works well with TensorFlow Lite, enabling model optimization on mobile and embedded devices.
When choosing a tool, the compatibility of devices plays a very important role. If the same program needs to be rewritten separately for each device, this will be very expensive for the company. This compatibility can be improved by favoring devices that support the most common standards such as Khronos Group standards.
The tested devices were Google Coral, NVIDIA Jetson Nano, NXP S32V234 and Raspberry Pi 4. The test application used in this thesis was emotional detection which consists of two parts. First, a face had to be found in the image, after which the face was cropped and fed into the emotional detection model. The the devices were compared with three parameters: the time taken to detect the face, the time taken to identify the emotion, and the number of processed frames per second.
The test consisted of two variables: input source and face detection algorithm. The used input sources were live stream and the pre-recorded video. The face detection algorithm was performed both with the “Haar cascade object detection” (HCOD) algorithm and in the most optimal way for each device. For example with Google Coral this means that the optimized face detection model from Google Coral's website was used. The test results with optimal face detection were not easily comparable because the implementation changed so much.
As a result, the NXP S32V234 had the best performance. However, the programming development on that device was challenging, which meant that not all tests could even be performed. The optimal test performance with a camera stream varied between 25,4 FPS with S32V234 to 10,9 FPS with Jetson Nano. Google Coral, NVIDIA Jetson Nano and Raspberry Pi 4 were more or less in the same category in the test with HCOD face detection.
As these tests show, embedded devices have become powerful enough to perform heavy deep learning calculus. This opens up new opportunities for many research areas to make human lives healthier, happier and safer.
The machine learning tools studied were comparable in model training performance. Each tool has its strengths and weaknesses. Keras is easy to use and beginner friendly, but the customizability is limited. PyTorch, on the other hand, is very customizable, but requires more understanding about machine learning. TensorFlow works well with TensorFlow Lite, enabling model optimization on mobile and embedded devices.
When choosing a tool, the compatibility of devices plays a very important role. If the same program needs to be rewritten separately for each device, this will be very expensive for the company. This compatibility can be improved by favoring devices that support the most common standards such as Khronos Group standards.
The tested devices were Google Coral, NVIDIA Jetson Nano, NXP S32V234 and Raspberry Pi 4. The test application used in this thesis was emotional detection which consists of two parts. First, a face had to be found in the image, after which the face was cropped and fed into the emotional detection model. The the devices were compared with three parameters: the time taken to detect the face, the time taken to identify the emotion, and the number of processed frames per second.
The test consisted of two variables: input source and face detection algorithm. The used input sources were live stream and the pre-recorded video. The face detection algorithm was performed both with the “Haar cascade object detection” (HCOD) algorithm and in the most optimal way for each device. For example with Google Coral this means that the optimized face detection model from Google Coral's website was used. The test results with optimal face detection were not easily comparable because the implementation changed so much.
As a result, the NXP S32V234 had the best performance. However, the programming development on that device was challenging, which meant that not all tests could even be performed. The optimal test performance with a camera stream varied between 25,4 FPS with S32V234 to 10,9 FPS with Jetson Nano. Google Coral, NVIDIA Jetson Nano and Raspberry Pi 4 were more or less in the same category in the test with HCOD face detection.
As these tests show, embedded devices have become powerful enough to perform heavy deep learning calculus. This opens up new opportunities for many research areas to make human lives healthier, happier and safer.