Dimensionality Reduction in Multivariate Time Series Data
Sipilä, Anni (2024)
Sipilä, Anni
2024
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-06-13
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202406077043
https://urn.fi/URN:NBN:fi:tuni-202406077043
Tiivistelmä
The abundance of sensor systems in various fields generates large volumes of time series data, characterized by high dimensionality, which not only escalates storage and computational costs but also magnifies the risk of overfitting in predictive modeling. Dimensionality reduction (DR) techniques serve as essential tools to alleviate these challenges by simplifying the data with- out sacrificing its intrinsic value. This thesis presents a comprehensive exploration of the application of various DR techniques on multivariate time series data sets. The research addresses two primary questions: First, how can dimensionality reduction be effectively employed to preserve the essential quality of data? Second, how do various DR methods, specifically those tailored for time series data, compare across different application areas?
This work evaluates numerous DR techniques: Random Sampling, K-means clustering, PCA, PAA, DCT, Autoencoder, PLS, and LDA. The thesis examines these techniques across a range of dimensions to ascertain their impact on the preservation of critical data attributes and their influence on subsequent classification tasks. The evaluation involves a detailed empirical analysis, assessing the impact of each DR method on data quality and classification accuracy. This comparative study not only highlights the strengths and limitations of each technique but also identifies the best practices for applying DR in different contexts. By providing a nuanced understanding of how these methods perform across varied datasets, this research aims to guide the selection of appropriate DR strategies, optimizing both efficiency and effectiveness.
The thesis delves into the comparative analysis of DR techniques applied to diverse time series datasets from three domains: MotionSense, EEG Brainwave, and MIT-BIH Arrhythmia. Each dataset presents unique challenges due to its high dimensionalities, making them ideal for assessing the effectiveness of DR methods in reducing data complexity while maintaining essential information quality.
Key findings demonstrate that while some methods, like DCT and PCA, offer robust performance across various dimensions, others, such as Autoencoders and Simple Random Sampling, show limitations, particularly when applied to data with intricate temporal dynamics. The study extends beyond theoretical application to include a practical examination of reduced subsets of data, further detailing the scalability and adaptability of each DR method under constrained scenarios. The insights derived from this research are intended to contribute significantly to the fields of data science and time series analysis, providing a solid foundation for future studies and advancements in this vital area of research.
This work evaluates numerous DR techniques: Random Sampling, K-means clustering, PCA, PAA, DCT, Autoencoder, PLS, and LDA. The thesis examines these techniques across a range of dimensions to ascertain their impact on the preservation of critical data attributes and their influence on subsequent classification tasks. The evaluation involves a detailed empirical analysis, assessing the impact of each DR method on data quality and classification accuracy. This comparative study not only highlights the strengths and limitations of each technique but also identifies the best practices for applying DR in different contexts. By providing a nuanced understanding of how these methods perform across varied datasets, this research aims to guide the selection of appropriate DR strategies, optimizing both efficiency and effectiveness.
The thesis delves into the comparative analysis of DR techniques applied to diverse time series datasets from three domains: MotionSense, EEG Brainwave, and MIT-BIH Arrhythmia. Each dataset presents unique challenges due to its high dimensionalities, making them ideal for assessing the effectiveness of DR methods in reducing data complexity while maintaining essential information quality.
Key findings demonstrate that while some methods, like DCT and PCA, offer robust performance across various dimensions, others, such as Autoencoders and Simple Random Sampling, show limitations, particularly when applied to data with intricate temporal dynamics. The study extends beyond theoretical application to include a practical examination of reduced subsets of data, further detailing the scalability and adaptability of each DR method under constrained scenarios. The insights derived from this research are intended to contribute significantly to the fields of data science and time series analysis, providing a solid foundation for future studies and advancements in this vital area of research.
