Outlier Detection in Weight Time Series of Connected Scales: A Comparative Study
Mehrang, Saeed (2016)
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
Smart and connected health technologies as part of the digitally supporting health and heathcare plans can play an explicitly important role in improving preventive healthcare and patient outcomes, decreasing costs, and speeding up the scientific discoveries. Rigorous information processing approaches, such as outlier detection and data cleaning, are therefore needed to enhance the reliability of the acquired data. A "smart electronic weight scale" is a connected sensor that regularly measures and stores time series of body mass values. The long-term self-weighing time series data, like any other time series data, may occasionally contain abnormal values which are called "outliers". The existence of these outlying values can distort or mislead the data analysis. In this thesis, detection of outliers in time series of weight measurements of 10,000 anonymous Withings weight scale users is investigated. Four point-wise outlier detection approaches are studied and compared from different aspects. These techniques are: (1) a method based on Autoregressive Integrated Moving Average (ARIMA) time series modelling, (2) moving Median Absolute Deviation (MAD) scale estimate, (3) conventional Rosner statistic, and (4) windowed Rosner statistic. The results suggest that ARIMA approach, moving MAD and windowed Rosner statistic can properly find the outliers; however, in case of facing missing data the only method which was able to ideally identify the outliers was ARIMA approach. In contrast, conventional Rosner statistic did not show acceptable outlier detection power. The computational complexity of the ARIMA approach was unsatisfactorily costly, whilst the rest of the tested techniques were quite fast in terms of computation time.