Housing unit price prediction system
Eronen, Juuso (2018)
Eronen, Juuso
2018
Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-11-07
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201810242467
https://urn.fi/URN:NBN:fi:tty-201810242467
Tiivistelmä
The objective of this thesis was to compare different regression methods for predicting the price of housing unit, to design and develop a price estimation software and to determine which features are most important when determining the price of a housing unit. The software was developed for Alma Mediapartners Oy to be integrated with their housing marketplace Etuovi.com. Amazon’s SageMaker cloud machine learning platform was used to develop and deploy the software.
The data consisted of Etuovi.com’s housing unit advertisements posted by real estate agencies and individual customers. Comparing the machine learning algorithms and developing the software used data from a time period of one year. The features used for training the models included, for example, location, size of the housing unit, age of the building and house type. The tested algorithms included linear regression, regression tree, random forest, gradient boosting and extreme gradient boosting out of which extreme gradient boosting had the best performance. The final model showed that over half of the test samples had an error of less than one percent while 80% of the samples had an error of less than ten percent. Less than four percent of the test samples had an error of 25% or more.
The software was developed on Amazon SageMaker following SageMaker’s developer guide. The software fetches the housing unit dataset from Etuovi.com’s data warehouse and trains a model on the dataset on a virtual machine using the extreme gradient boosting-algorithm. The trained model is then hosted in the cloud and can be integrated with Etuovi.com as an independent component.
The data consisted of Etuovi.com’s housing unit advertisements posted by real estate agencies and individual customers. Comparing the machine learning algorithms and developing the software used data from a time period of one year. The features used for training the models included, for example, location, size of the housing unit, age of the building and house type. The tested algorithms included linear regression, regression tree, random forest, gradient boosting and extreme gradient boosting out of which extreme gradient boosting had the best performance. The final model showed that over half of the test samples had an error of less than one percent while 80% of the samples had an error of less than ten percent. Less than four percent of the test samples had an error of 25% or more.
The software was developed on Amazon SageMaker following SageMaker’s developer guide. The software fetches the housing unit dataset from Etuovi.com’s data warehouse and trains a model on the dataset on a virtual machine using the extreme gradient boosting-algorithm. The trained model is then hosted in the cloud and can be integrated with Etuovi.com as an independent component.