Churn Prediction in SaaS using Machine Learning
Rautio, Anton Juhani Oskari (2019)
Rautio, Anton Juhani Oskari
2019
Tietojohtaminen
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-05-23
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201905211686
https://urn.fi/URN:NBN:fi:tty-201905211686
Tiivistelmä
Customer churn happens in the Software-as-a-Service business similarly as it is in subscription-based industries like the telecommunications industry. But companies lack the knowledge about the factors lead to customers churn and are unable to react to it in time. Thus, it is necessary for companies to research customer churn prediction in order to react to customer churn in time.
The study examines customer churn prediction in a quantitative method by utilizing several different machine learning algorithms with Python, namely recurrent neural network, convolutional neural network, support vector machine, and random forest algorithms. Data was collected from the case company’s database and manipulated to fit the algorithms. The dataset includes customer business data such as spend, customer platform usage data, customer service history data and customer feedback data on service quality. Grid search was carried out to find the optimal hyperparameters for each machine learning algorithm. The models of the algorithms were then trained and evaluated with the fitted data using the optimal hyperparameters. After the models had been trained, the test data was run through the models to get the results of the analysis.
The results conclude that the most precise machine learning algorithm in this case is the support vector machine. Deep learning algorithms, such as the recurrent neural network and convolutional neural network did not perform well. Random forest had mediocre performance, coming close to the support vector machine’s performance. The random forest algorithm also offered a view on the importance of each feature in the prediction and showed that platform usage metrics, service quality metrics and business metrics are the largest drivers of churn in this case.
The study examines customer churn prediction in a quantitative method by utilizing several different machine learning algorithms with Python, namely recurrent neural network, convolutional neural network, support vector machine, and random forest algorithms. Data was collected from the case company’s database and manipulated to fit the algorithms. The dataset includes customer business data such as spend, customer platform usage data, customer service history data and customer feedback data on service quality. Grid search was carried out to find the optimal hyperparameters for each machine learning algorithm. The models of the algorithms were then trained and evaluated with the fitted data using the optimal hyperparameters. After the models had been trained, the test data was run through the models to get the results of the analysis.
The results conclude that the most precise machine learning algorithm in this case is the support vector machine. Deep learning algorithms, such as the recurrent neural network and convolutional neural network did not perform well. Random forest had mediocre performance, coming close to the support vector machine’s performance. The random forest algorithm also offered a view on the importance of each feature in the prediction and showed that platform usage metrics, service quality metrics and business metrics are the largest drivers of churn in this case.