Modeling Indirect Market Impact of Limit Order Book Messages with Machine Learning Methods
Turunen, Max (2024)
Turunen, Max
2024
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-09-02
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202408228260
https://urn.fi/URN:NBN:fi:tuni-202408228260
Tiivistelmä
This study proposes and evaluates a new methodology to study indirect market impact of limit order book (LOB) messages using state-of-the-art machine learning models and early data fusion. The primary objective is to evaluate the efficacy of the proposed methodology in measuring the indirect market impact of LOB messages. The research utilises data from Nasdaq TotalView-ITCH 5.0, focusing on Apple, Facebook, Google, Intel, and Microsoft stocks.
The methodology involves integrating LOB message data into machine learning models to assess their effect on the accuracy of price predictions. The study examines the different types of LOB messages, such as order submissions, cancellations, and executions, to determine their individual and collective impact on market behavior.
A total of 2400 models were trained for the study. These models were trained in sets of ten, each set utilising identical hyperparameters except for ten distinct run seed numbers. Each set of ten models comprises a model group trained with messages and another group trained with the same hyperparameters but without LOB message data. To determine whether LOB messages resulted in improved accuracy, the metrics of the models were averaged and compared.
Among the averaged models, 47% exhibited improved R^2 and MSE scores with the inclusion of LOB messages. However, the remaining 53% of the models demonstrated poorer performance. Notably, when the inclusion of LOB messages impacted accuracy positively, the magnitude of this improvement was greater than the magnitude of the negative impact when it occurred.
The results demonstrate that incorporating LOB messages does not significantly improve the accuracy of price prediction. However, the models are capable of modelling the LOB messages in a manner that provides insights into the indirect market impact of various types of messages, albeit the modelled indirect market impact is not robust and consistent across all stocks and parameters. This may suggest that either the indirect market impact is specific to stocks and prediction horizons or that the proposed models are in fact modeling noise. Concavity of the majority of indirect market impact functions validates that indirect market impact is being modelled, as this is a well-established concept in the field.
The findings illustrate how the selected machine learning models may capture the market impact of various message types as variations in message price and quantity are introduced, while other message variables remain constant. This research highlights the potential of machine learning in refining market impact analysis and proposes avenues for future studies in this domain.
The methodology involves integrating LOB message data into machine learning models to assess their effect on the accuracy of price predictions. The study examines the different types of LOB messages, such as order submissions, cancellations, and executions, to determine their individual and collective impact on market behavior.
A total of 2400 models were trained for the study. These models were trained in sets of ten, each set utilising identical hyperparameters except for ten distinct run seed numbers. Each set of ten models comprises a model group trained with messages and another group trained with the same hyperparameters but without LOB message data. To determine whether LOB messages resulted in improved accuracy, the metrics of the models were averaged and compared.
Among the averaged models, 47% exhibited improved R^2 and MSE scores with the inclusion of LOB messages. However, the remaining 53% of the models demonstrated poorer performance. Notably, when the inclusion of LOB messages impacted accuracy positively, the magnitude of this improvement was greater than the magnitude of the negative impact when it occurred.
The results demonstrate that incorporating LOB messages does not significantly improve the accuracy of price prediction. However, the models are capable of modelling the LOB messages in a manner that provides insights into the indirect market impact of various types of messages, albeit the modelled indirect market impact is not robust and consistent across all stocks and parameters. This may suggest that either the indirect market impact is specific to stocks and prediction horizons or that the proposed models are in fact modeling noise. Concavity of the majority of indirect market impact functions validates that indirect market impact is being modelled, as this is a well-established concept in the field.
The findings illustrate how the selected machine learning models may capture the market impact of various message types as variations in message price and quantity are introduced, while other message variables remain constant. This research highlights the potential of machine learning in refining market impact analysis and proposes avenues for future studies in this domain.