Representation Learning for candlesticks time-series data: A contrastive learning approach
Pham Dinh, Khoa (2023)
Pham Dinh, Khoa
2023
Bachelor's Programme in Science and Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-05-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202304254457
https://urn.fi/URN:NBN:fi:tuni-202304254457
Tiivistelmä
In time series analysis, an important question is to distinguish between highly different samples. This question is even more crucial in the financial market, where time series data usually possess stochastic characteristics and are not easily separated from the human perspective. Representation learning, a prominent area of machine learning research, may potentially solve this challenge.
In this thesis, the vector representations of time series samples are learned using the contrastive learning approach, a subfield of representation learning. The Siamese network constructed from BiN-DenseNet architecture is used for training. The data is divided into pairs of positive or negative samples and fed to the Siamese network. The goal is to map a time series sample into a new embedding space such that the cosine similarity of similar samples is high. The dataset used in this work is Bitcoin candlestick data. Since Bitcoin is a new asset class, there are limited investigations on Bitcoin's price vector representation. Therefore, this is an opportunity to explore the properties of this asset's price.
After obtaining the vector representations of the data, we can use them for improving downstream tasks such as regression or classification. In this thesis, to evaluate the usefulness of our approach, I designed proxy tasks to compare the performance between pre-trained contrastive learning model and baseline non-pre-trained model. The results show that pretraining increases the model's performance by a slight improvement and also eases the training process.
In this thesis, the vector representations of time series samples are learned using the contrastive learning approach, a subfield of representation learning. The Siamese network constructed from BiN-DenseNet architecture is used for training. The data is divided into pairs of positive or negative samples and fed to the Siamese network. The goal is to map a time series sample into a new embedding space such that the cosine similarity of similar samples is high. The dataset used in this work is Bitcoin candlestick data. Since Bitcoin is a new asset class, there are limited investigations on Bitcoin's price vector representation. Therefore, this is an opportunity to explore the properties of this asset's price.
After obtaining the vector representations of the data, we can use them for improving downstream tasks such as regression or classification. In this thesis, to evaluate the usefulness of our approach, I designed proxy tasks to compare the performance between pre-trained contrastive learning model and baseline non-pre-trained model. The results show that pretraining increases the model's performance by a slight improvement and also eases the training process.
Kokoelmat
- Kandidaatintutkielmat [8918]