Product should-cost estimations based on warehouse data
Tan, Zhihao (2024)
Tan, Zhihao
2024
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-04-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202404083352
https://urn.fi/URN:NBN:fi:tuni-202404083352
Tiivistelmä
Product should-cost estimation is essential for manufacturing industries since it brings a direct impact on organizational profitability and strategic sourcing. In practice, the should-cost of a product is calculated based on various cost analysis methods which may differ from the actual agreement price made between suppliers and purchasers. It possibly results in overpaying for the product and losing potential profits during the sourcing process. Therefore, for this thesis study, it is needed to develop a costing model that accurately estimates the should-cost of a product and visually displays the price difference between the estimated cost and the agreement price. Furthermore, as an outcome of this study, the created model would provide insights into the procurement process and give suggestions for making the best purchasing decision.
The concept of should-cost arose as a methodological approach for comparing the price between the quotation obtained from the supplier and the actual price a product should cost. There was relatively limited existing research about should-cost analysis and rare current studies offered clear suggestions about how should-cost estimation should be implemented. Comparatively, there were more appearances of commercial software solutions in the market. Therefore, the purpose of this study is to implement a general machine learning pipeline for exploring the should-cost estimation problem. Additionally, the experiments and analyses were applied to the selected product category to ascertain the current cost, and multiple regression models were utilized in order to find the linear relationship of the product-related known historical data for making cost estimation of possible similar new products in the future.
This study utilizes machine learning models, which include multivariate linear regression, K-nearest neighbors regression, support vector regression, lasso regression, ridge regression, decision tree regression, and random forest regression. Since no machine learning model can provide analyses for the cost across all the product categories, these selected models offer the benefits of selecting the most suitable approach. In this study, the estimations were made based on the characteristics of the particular product category. The original historical data was obtained from several data warehouses of the case company. The data was used in the created pipelines to make estimations and relevant discussions were made for the model results. The evaluation metrics used for evaluating the models in this study include mean squared error, mean absolute error, coefficient of determination, and cross-validation.
The result of this study indicated that the lasso, ridge, and random forest regression models performed the best results after validation, while the rest exhibited relatively weaker performances. This study aimed to make contributions from three perspectives. To begin with, the existing research and relevant literature were highlighted to satisfy the need to establish theoretical understandings of models and methods for product cost estimation. Later on, a comprehensive pipeline for data collection, preprocessing, and model training was realized based on the necessity of the study. Consequently, statistical interpretation of data and results were exhibited for understanding the analyses. Lastly, practical implementations were conducted in this study to contribute to the limited research of this topic, such as enhancing the domain of should-cost and supporting sourcing to assess the reasonableness of the quoted price from suppliers.
The concept of should-cost arose as a methodological approach for comparing the price between the quotation obtained from the supplier and the actual price a product should cost. There was relatively limited existing research about should-cost analysis and rare current studies offered clear suggestions about how should-cost estimation should be implemented. Comparatively, there were more appearances of commercial software solutions in the market. Therefore, the purpose of this study is to implement a general machine learning pipeline for exploring the should-cost estimation problem. Additionally, the experiments and analyses were applied to the selected product category to ascertain the current cost, and multiple regression models were utilized in order to find the linear relationship of the product-related known historical data for making cost estimation of possible similar new products in the future.
This study utilizes machine learning models, which include multivariate linear regression, K-nearest neighbors regression, support vector regression, lasso regression, ridge regression, decision tree regression, and random forest regression. Since no machine learning model can provide analyses for the cost across all the product categories, these selected models offer the benefits of selecting the most suitable approach. In this study, the estimations were made based on the characteristics of the particular product category. The original historical data was obtained from several data warehouses of the case company. The data was used in the created pipelines to make estimations and relevant discussions were made for the model results. The evaluation metrics used for evaluating the models in this study include mean squared error, mean absolute error, coefficient of determination, and cross-validation.
The result of this study indicated that the lasso, ridge, and random forest regression models performed the best results after validation, while the rest exhibited relatively weaker performances. This study aimed to make contributions from three perspectives. To begin with, the existing research and relevant literature were highlighted to satisfy the need to establish theoretical understandings of models and methods for product cost estimation. Later on, a comprehensive pipeline for data collection, preprocessing, and model training was realized based on the necessity of the study. Consequently, statistical interpretation of data and results were exhibited for understanding the analyses. Lastly, practical implementations were conducted in this study to contribute to the limited research of this topic, such as enhancing the domain of should-cost and supporting sourcing to assess the reasonableness of the quoted price from suppliers.