Data Mining on Deflectometric Data of Surface Defects
Kuosmanen, Pinja Marika (2017)
Kuosmanen, Pinja Marika
2017
Teknis-luonnontieteellinen
Teknis-luonnontieteellinen tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2017-09-06
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201708241804
https://urn.fi/URN:NBN:fi:tty-201708241804
Tiivistelmä
The objective of this thesis is to apply machine learning and data mining methods, especially classification and clustering, onto deflectometric data of surface defects found on car bodies. The data is acquired via robot-assisted automated surface inspection system, manufactured by Micro-Epsilon GmbH, called reflectCONTROL. The measurement method of reflectCONTROL is based on Phase Measuring Deflectometry. The aim is to explore the possibility of automated defect classification via learning algorithms, and to gain new insight about the deflectometric data obtained from surface inspection process via clustering.
This thesis is divided into theoretical part and empirical part. Basic concepts of machine learning and methods used in empirical evaluation are introduced in the theoretical part. The methods include two classification algorithms; Random Forest Classifier and Support Vector Machines. The three clustering algorithms used are k-means, Affinity Propagation and HDBSCAN. Furthermore, dimensionality reduction methods, such as Principal Component Analysis and t-SNE are included. In addition, the possibility of using Point Feature Histograms in the context of deflectometric data and feature generation is explored. The biggest challenge related to this research is that the data set used is highly unbalanced, the biggest class dominating over others in the learning tasks.
The empirical study indicates that Random Forest Classifier and Support Vector Machines perform very similarly in classification tasks. Furthermore, it is possible to distinguish between different classes using multiclass classification on balanced data. It was also found via silhouette analysis and dimensionality reduction, that internal structure exists in the data, but it does not correspond to the human-assigned class labels of the defects. Finally, the study indicates that Point Feature Histogram-based features do not improve the classification performance significantly, but are helpful in clustering tasks by improving the correspondence between internal structure and human-assigned labels. The results of this study are promising for further research, suggesting that it is possible to conduct research on unlabeled data via clustering, and distinguish between different defect classes using appropriately selected data.
This thesis is divided into theoretical part and empirical part. Basic concepts of machine learning and methods used in empirical evaluation are introduced in the theoretical part. The methods include two classification algorithms; Random Forest Classifier and Support Vector Machines. The three clustering algorithms used are k-means, Affinity Propagation and HDBSCAN. Furthermore, dimensionality reduction methods, such as Principal Component Analysis and t-SNE are included. In addition, the possibility of using Point Feature Histograms in the context of deflectometric data and feature generation is explored. The biggest challenge related to this research is that the data set used is highly unbalanced, the biggest class dominating over others in the learning tasks.
The empirical study indicates that Random Forest Classifier and Support Vector Machines perform very similarly in classification tasks. Furthermore, it is possible to distinguish between different classes using multiclass classification on balanced data. It was also found via silhouette analysis and dimensionality reduction, that internal structure exists in the data, but it does not correspond to the human-assigned class labels of the defects. Finally, the study indicates that Point Feature Histogram-based features do not improve the classification performance significantly, but are helpful in clustering tasks by improving the correspondence between internal structure and human-assigned labels. The results of this study are promising for further research, suggesting that it is possible to conduct research on unlabeled data via clustering, and distinguish between different defect classes using appropriately selected data.