Applicant ranking in loan comparison : Conversion prediction and rejected applicant segmentation
Moloney, Kian (2026)
Moloney, Kian
2026
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2026-05-20
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202605125442
https://urn.fi/URN:NBN:fi:tuni-202605125442
Tiivistelmä
Online loan comparison and brokerage (OLCB) platforms are digital intermediaries that aggregate loan offers from multiple banks and financing institutions into a single application interface for borrowers. These platforms earn commission only when a loan is disbursed, making conversion the central operational metric. Conversion rates are low: roughly two-thirds of applicants receive no offers, and only around one in four of the remainder end up with a disbursement. This thesis develops a machine learning-based ranking system for a Finnish OLCB platform that classifies every applicant by conversion potential or financial profile. The research method is constructive: the ranking system is designed, implemented, and evaluated against real applicant data.
The system partitions the full applicant population into three branches, each with a distinct modeling approach. For applicants who receive at least one offer, a Random Forest (RF) model predicts conversion probability and assigns them to one of five tiers ranging from near-certain conversion to near-zero. For applicants who receive no offers, a K-Means clustering model segments them into three financially distinct profiles based on debt structure and loan demand. Applicants with payment default records are classified as F-class through a deterministic rule. The three branches together produce a nine-tier classification covering every applicant.
The conversion model achieves a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) of 0.807 and a Precision-Recall Area Under the Curve (PR-AUC) of 0.675 on the held-out test set, with a cross-validated average of 0.806 ± 0.004 across five folds. The top decile lift of 2.90x versus an Annual Percentage Rate (APR) gap baseline of 2.09x demonstrates that the model captures signal beyond the single most predictive feature. SHapley Additive eXplanations (SHAP) analysis identifies offer quality features as the dominant predictors of conversion; applicant demographics carry minimal predictive weight. The clustering model produces three interpretable segments with a silhouette score of 0.52, sufficient to support segment-specific re-engagement strategies.
The SHAP results indicate that conversion behaviour in the OLCB context is more supply-driven than demand-driven: offer quality is a stronger determinant of outcome than applicant demographics. A low-tier applicant is thus not necessarily a low-quality customer but one who received an uncompetitive offer. The ranking system uses this structure to enable targeted follow-up on high-probability converters and acquisition channel quality measurement through tier distribution rather than application volume.
The system partitions the full applicant population into three branches, each with a distinct modeling approach. For applicants who receive at least one offer, a Random Forest (RF) model predicts conversion probability and assigns them to one of five tiers ranging from near-certain conversion to near-zero. For applicants who receive no offers, a K-Means clustering model segments them into three financially distinct profiles based on debt structure and loan demand. Applicants with payment default records are classified as F-class through a deterministic rule. The three branches together produce a nine-tier classification covering every applicant.
The conversion model achieves a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) of 0.807 and a Precision-Recall Area Under the Curve (PR-AUC) of 0.675 on the held-out test set, with a cross-validated average of 0.806 ± 0.004 across five folds. The top decile lift of 2.90x versus an Annual Percentage Rate (APR) gap baseline of 2.09x demonstrates that the model captures signal beyond the single most predictive feature. SHapley Additive eXplanations (SHAP) analysis identifies offer quality features as the dominant predictors of conversion; applicant demographics carry minimal predictive weight. The clustering model produces three interpretable segments with a silhouette score of 0.52, sufficient to support segment-specific re-engagement strategies.
The SHAP results indicate that conversion behaviour in the OLCB context is more supply-driven than demand-driven: offer quality is a stronger determinant of outcome than applicant demographics. A low-tier applicant is thus not necessarily a low-quality customer but one who received an uncompetitive offer. The ranking system uses this structure to enable targeted follow-up on high-probability converters and acquisition channel quality measurement through tier distribution rather than application volume.
