New hybrid AI tool classifies credit risk with 80% accuracy using real utility data
The standout finding was the robust correlation between payment discipline and credit score. Individuals who paid earlier and in full were more likely to score higher, while a consistent pattern of late payments, especially with a low cash payment ratio, was strongly predictive of poor credit performance. Statistical analysis revealed that late payment rate alone explained up to 28% of the variance in credit scores.

In a significant advance for financial inclusion and algorithmic transparency, researchers have introduced a hybrid machine learning model that evaluates credit risk using real-world bill payment habits instead of traditional credit history.
The study titled “A Hybrid Approach to Credit Risk Assessment Using Bill Payment Habits Data and Explainable Artificial Intelligence,” published in Applied Sciences (Vol. 15, Issue 10, 2025) presents a rare fusion of real, granular behavioral data and explainable artificial intelligence (XAI) to classify consumers into ten distinct credit risk tiers. This approach could be transformative for millions of credit-invisible individuals, especially in emerging economies, while also highlighting critical gaps in data quality and model accountability.
Can alternative data replace traditional credit history?
The study utilized a proprietary dataset from a Turkish payment institution, covering the period between January 2021 and January 2022. It analyzed 42,117 anonymized individual records that captured detailed bill payment behaviors, including payment frequency, late and early payment days, types of bills paid, and amounts transacted. These behavioral features were linked to real credit scores obtained from Turkey’s Credit Registry Office.
Unlike traditional two-class credit scoring models, this research adopted a more granular multi-class format with 10 distinct credit score brackets. Feature selection techniques (ANOVA F-test, Chi-square, and mutual information) were deployed to identify the most predictive behaviors. Key features that consistently influenced credit outcomes included the percentage of bills paid in cash, late payment rate, number of invoice types, average days paid early or late, and customer age.
The standout finding was the robust correlation between payment discipline and credit score. Individuals who paid earlier and in full were more likely to score higher, while a consistent pattern of late payments, especially with a low cash payment ratio, was strongly predictive of poor credit performance. Statistical analysis revealed that late payment rate alone explained up to 28% of the variance in credit scores.
What modeling techniques delivered the best results?
The study rigorously tested seven machine learning algorithms: Logistic Regression, Decision Trees, Support Vector Machines, Random Forest, Extra Trees Classifier, Naive Bayes, and Multi-Layer Perceptron. These models were trained and validated using 5-fold cross-validation. Initially, tree-based models outperformed others, but the highest overall performance was achieved by a hybrid model that combined the following elements:
- Feature selection via ANOVA F-test
- Class balancing using the SMOTE oversampling technique
- Classification using the Extra Trees (EXT) algorithm
This configuration produced an accuracy of 80.49%, precision of 79.89%, and area under the ROC curve (AUC) of 97.04%. Notably, the model’s results were consistent and stable across cross-validation folds, indicating generalizability. The Extra Trees Classifier, not widely adopted in previous credit risk studies, emerged as particularly effective due to its high variance reduction capability and computational efficiency.
Once optimized, the model was rendered explainable using XAI tools, specifically, LIME (Local Interpretable Model-agnostic Explanations) and SHAP (Shapley Additive Explanations). These tools allowed researchers to trace and visualize how specific features influenced individual predictions, thereby making the so-called “black box” more transparent to regulators and users alike.
How transparent and inclusive is the new risk model?
While the technical results are promising, the study underscores several challenges that must be addressed before such models can be deployed at scale.
Data Quality and Preprocessing: The dataset underwent substantial cleaning to remove outliers, duplicates, and inconsistent relational entries. Features like total payment amount and number of invoices exhibited extreme skewness and kurtosis, necessitating normalization. Even after these steps, the model’s reliance on numerical regularity highlights a vulnerability to missing or erroneous inputs, particularly in resource-constrained environments where reliable data may be scarce.
Imbalanced Classification Problem: The original dataset was heavily imbalanced, with higher credit scores overrepresented. The authors addressed this by applying SMOTE (Synthetic Minority Over-sampling Technique), a resampling method that synthetically augments underrepresented classes. While this improved accuracy, it also introduced dependencies on artificial data structures that may not capture real-world complexity.
Explainability and Compliance: One of the model’s defining strengths lies in its compliance readiness. The use of SHAP and LIME enables financial institutions to provide reasoned justifications for credit decisions - an increasingly important regulatory requirement under the EU’s General Data Protection Regulation (GDPR) and the U.S. Equal Credit Opportunity Act (ECOA). The model provides localized explanations for each prediction, identifying which features, such as overdue rates or cash payment ratios, contributed most significantly to an applicant’s score.
For instance, SHAP outputs revealed that late payment behavior had the strongest negative influence on credit scores from class 1 to 8, while early payment behavior and high cash payment ratios positively influenced scores in the top-tier brackets. LIME outputs further confirmed these associations by simulating decision boundaries for each credit class.
Limitations and Deployment Risks: Despite its strengths, the model’s success hinges on high-quality, up-to-date behavioral data, which is not always available. Additionally, computational requirements for feature extraction, SMOTE sampling, and model training are non-trivial, potentially limiting implementation in under-resourced environments.
Moreover, while the model is explainable in structure, its fairness across demographic groups remains an open question. Future studies should explore whether features like bill type, payment method, or even gender, which emerged as a relevant predictor, introduce unintended biases.
- FIRST PUBLISHED IN:
- Devdiscourse