Automatic Ice Hockey Shot Detection Using Neural Networks
Salonen, Herkko (2023)
Salonen, Herkko
2023
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-02-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2023121310815
https://urn.fi/URN:NBN:fi:tuni-2023121310815
Tiivistelmä
This thesis investigates the feasibility of implementing ice hockey shot detection through the utilization of neural networks and indoor localization data. The neural networks assessed convolutional neural networks, recurrent neural networks, and a composite model. The composite model comprised distinct neural networks for event detection and event classification. All models underwent training with an identical dataset collected from actual ice hockey matches. The evaluation of these models was conducted using a custom testbench setup, with results analyzed through the application of a confusion matrix.
The input sequence for the networks comprised data both preceding and following the occurrence of the actual shot. An investigation was conducted to ascertain the optimal input sequence length for the event classification model. A comprehensive exploration involving a total of 81 distinct combinations of sequence lengths, varying the temporal intervals before and after the shot, was undertaken. The findings indicated that the data preceding the shot exerted a comparatively lesser influence on the model accuracy in comparison to the data following the shot.
The feature evaluation encompassed three distinct feature sets across four different networks. These feature sets included temporal positional data devoid of derived features, a set of features computed from a single sample, and another set of features calculated from multiple samples. Notably, temporal positional data yielded a recall below 0.5 and exhibited low precision across all model types. In contrast, multisample features consistently yielded higher recall compared to single sample features for every model type. Among models utilizing single sample features, CNN and GRU-based models achieved higher precision, although the best precision attained remained below 0.01.
The exploration of segmenting shot detection into two distinct tasks was also undertaken. A model, comprised of two neural networks, was trained to elucidate this conceptual division. One neural network was dedicated to the detection of events, while the other was focused on the classification of said events. In comparison to alternative models, the two-network model achieved much lower false positive rate than other models.
The models underwent evaluation using a test bench designed to simulate real-world usage scenarios. Every evaluated match was transformed into samples, generating one sample for every 20ms timestep. To ensure consistency and mitigate implementation variations, each sample was subjected to evaluation across the models. The outputs of the models were then compared to annotations, and the resulting comparisons were fed into a confusion matrix. Subsequently, recall, precision, and false positive rate were computed from these confusion matrices, forming the basis for the evaluation metrics employed in the study.
The input sequence for the networks comprised data both preceding and following the occurrence of the actual shot. An investigation was conducted to ascertain the optimal input sequence length for the event classification model. A comprehensive exploration involving a total of 81 distinct combinations of sequence lengths, varying the temporal intervals before and after the shot, was undertaken. The findings indicated that the data preceding the shot exerted a comparatively lesser influence on the model accuracy in comparison to the data following the shot.
The feature evaluation encompassed three distinct feature sets across four different networks. These feature sets included temporal positional data devoid of derived features, a set of features computed from a single sample, and another set of features calculated from multiple samples. Notably, temporal positional data yielded a recall below 0.5 and exhibited low precision across all model types. In contrast, multisample features consistently yielded higher recall compared to single sample features for every model type. Among models utilizing single sample features, CNN and GRU-based models achieved higher precision, although the best precision attained remained below 0.01.
The exploration of segmenting shot detection into two distinct tasks was also undertaken. A model, comprised of two neural networks, was trained to elucidate this conceptual division. One neural network was dedicated to the detection of events, while the other was focused on the classification of said events. In comparison to alternative models, the two-network model achieved much lower false positive rate than other models.
The models underwent evaluation using a test bench designed to simulate real-world usage scenarios. Every evaluated match was transformed into samples, generating one sample for every 20ms timestep. To ensure consistency and mitigate implementation variations, each sample was subjected to evaluation across the models. The outputs of the models were then compared to annotations, and the resulting comparisons were fed into a confusion matrix. Subsequently, recall, precision, and false positive rate were computed from these confusion matrices, forming the basis for the evaluation metrics employed in the study.