Neural Networks Training Acceleration Through Weight Prediction
Nguyen, Quoc Minh (2024)
Nguyen, Quoc Minh
2024
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-01-30
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202401101351
https://urn.fi/URN:NBN:fi:tuni-202401101351
Tiivistelmä
Researchers have successfully improved the inference speed of deep learning models through various algorithmic or hardware acceleration methods. However, the process of training these models remains computationally expensive and time-consuming, posing significant challenges in developing and testing new models.
This thesis proposes an approach to accelerate neural network training through weight prediction. The proposed method involves training a specialized neural network, which we refer to as the checkpoint predictor. In our approach, we first introduce a checkpoint dataset and utilize it to explore weight prediction for simple Convolutional Neural Networks (CNNs) on the CIFAR-10 dataset. The checkpoint predictor is trained using a time series of the evolution of all convolution kernels in a neural network. This training enables the checkpoint predictor to predict the convolution kernel that optimizes the targeted network’s performance. Additionally, we also conduct literature reviews on topics that are relevant to the literature in neural network training acceleration such as checkpoint datasets and parameter prediction in neural network training. These reviews provide a foundational understanding of the current state of research and potential areas for further exploration.
To validate the effectiveness of the proposed method, we have conducted extensive experiments involving various implementations of the checkpoint predictor. The experimental results demonstrate that our approach can accelerate neural network training compared to conventional training methods, represented by Stochastic Gradient Descent (SGD). Our proposed approach could be seen as a groundwork for improving deep learning model accessibility, as using a pretrained checkpoint predictor reduces the time required for training other neural networks.
This thesis proposes an approach to accelerate neural network training through weight prediction. The proposed method involves training a specialized neural network, which we refer to as the checkpoint predictor. In our approach, we first introduce a checkpoint dataset and utilize it to explore weight prediction for simple Convolutional Neural Networks (CNNs) on the CIFAR-10 dataset. The checkpoint predictor is trained using a time series of the evolution of all convolution kernels in a neural network. This training enables the checkpoint predictor to predict the convolution kernel that optimizes the targeted network’s performance. Additionally, we also conduct literature reviews on topics that are relevant to the literature in neural network training acceleration such as checkpoint datasets and parameter prediction in neural network training. These reviews provide a foundational understanding of the current state of research and potential areas for further exploration.
To validate the effectiveness of the proposed method, we have conducted extensive experiments involving various implementations of the checkpoint predictor. The experimental results demonstrate that our approach can accelerate neural network training compared to conventional training methods, represented by Stochastic Gradient Descent (SGD). Our proposed approach could be seen as a groundwork for improving deep learning model accessibility, as using a pretrained checkpoint predictor reduces the time required for training other neural networks.