Mid-Price Movement Prediction in Limit Order Books Using Feature Engineering and Machine Learning
Ntakaris, Adamantios (2019)
Ntakaris, Adamantios
Tampere University
2019
Teknisten tieteiden tohtoriohjelma - Doctoral Programme in Engineering Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2019-10-25
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-1288-6
https://urn.fi/URN:ISBN:978-952-03-1288-6
Tiivistelmä
The increasing complexity of financial trading in recent years revealed the need for methods that can capture its underlying dynamics. An efficient way to organize this chaotic system is by contracting limit order book ordering mechanisms that operate under price and time filters. Limit order book can be analyzed using linear and nonlinear models.
The thesis develops novelmethods for the identification of limit order book characteristics which provide traders and market makers an information edge in their trading. A good proxy for traders and market makers is the prediction of mid-price movement, which is the main target of this thesis. The contributions of this thesis are categorized chronologically into three parts. The first part refers to the introduction in the literature of the first publicly available limit order book dataset for high-frequency trading for the task of mid-price movement prediction. This dataset comes together with the development of an experimental protocol that utilizes methods inspired by ridge regression and a single layer feed-forward neural network as classifiers. These classifiers use state-of-the-art limit order book features as inputs for the target task.
The next contribution of this thesis is the use and development of a wide range of technical and quantitative indicators for the task of mid-price movement prediction via an extensive feature selection process. This feature selection process identifies which features improve predictability performance. The results suggest that the newly introduced quantitative feature based on an adaptive logistic regression model for online learning was selected first according to several criteria. These criteria operate according to entropy, linear discriminant analysis, and least mean square error.
The third contribution is the introduction of econometric features as inputs to deep learning models for the task of mid-price movement prediction. An extensive comparison against other state-of-the-art hand-crafted features and fully automated feature extraction processes is provided. Furthermore, a new experimental protocol is developed for the task of mid-price prediction, to overcome the problem of time irregularities, which characterizes high-frequency data. Results suggest that advanced hand-crafted features such as econometric indicators can predict movements of proxies, such as mid-price.
The thesis develops novelmethods for the identification of limit order book characteristics which provide traders and market makers an information edge in their trading. A good proxy for traders and market makers is the prediction of mid-price movement, which is the main target of this thesis. The contributions of this thesis are categorized chronologically into three parts. The first part refers to the introduction in the literature of the first publicly available limit order book dataset for high-frequency trading for the task of mid-price movement prediction. This dataset comes together with the development of an experimental protocol that utilizes methods inspired by ridge regression and a single layer feed-forward neural network as classifiers. These classifiers use state-of-the-art limit order book features as inputs for the target task.
The next contribution of this thesis is the use and development of a wide range of technical and quantitative indicators for the task of mid-price movement prediction via an extensive feature selection process. This feature selection process identifies which features improve predictability performance. The results suggest that the newly introduced quantitative feature based on an adaptive logistic regression model for online learning was selected first according to several criteria. These criteria operate according to entropy, linear discriminant analysis, and least mean square error.
The third contribution is the introduction of econometric features as inputs to deep learning models for the task of mid-price movement prediction. An extensive comparison against other state-of-the-art hand-crafted features and fully automated feature extraction processes is provided. Furthermore, a new experimental protocol is developed for the task of mid-price prediction, to overcome the problem of time irregularities, which characterizes high-frequency data. Results suggest that advanced hand-crafted features such as econometric indicators can predict movements of proxies, such as mid-price.
Kokoelmat
- Väitöskirjat [4901]