ABDULHAMEED IDRIS ADEDAMOLA

April 06, 2026

Credit Risk Prediction Using Decision Trees (3MTT Assignment Task 2)

This project focuses on developing a predictive model for classifying credit risk using a dataset of 100,000 customer records and 24 features. The goal is to predict whether a customer has a Poor or Standard credit score using Decision Tree classifiers.

Step 1: Import and Explore Dataset

The dataset includes features like Age, Occupation, Annual Income, Monthly Balance, Credit Mix, and Payment Behaviour. After exploring, there were no missing values.

The dataset is fairly balanced: 53% Standard and 47% Poor. A balanced dataset is crucial as it ensures that the model does not favor one class over the other and produces reliable predictions for both “Poor” and “Standard” credit scores.

Step 2: Transform Categorical Features

Categorical attributes such as Month, Occupation, Credit Mix, Payment of Minimum Amount, and Payment Behaviour were numerically encoded for the Decision Tree.

Feature	Original Value	Encoded Value
Credit_Mix	Good	1
Credit_Mix	Standard	2
Credit_Mix	Poor	3

Step 3: Original Decision Tree Model

The Decision Tree classifier was trained on 22 numeric features. Performance on the test set:

Accuracy: 0.77
F1-score: 0.75–0.78
ROC AUC: 0.7686

Confusion Matrix:

True \ Predicted	Poor	Standard
Poor	7053	2349
Standard	2258	8340

Step 4: Hyperparameter Tuning

Parameters tuned:

max_depth: [3, 5, 7]
min_samples_leaf: [5, 10]

Best parameters found: max_depth = 7, min_samples_leaf = 10. This ensures the tree captures patterns without overfitting.

Step 5: Tuned Decision Tree Model

Performance on test set after tuning:

Accuracy: 0.74
F1-score: 0.74 (both classes)
ROC AUC: 0.8109

Confusion Matrix:

True \ Predicted	Poor	Standard
Poor	7370	2032
Standard	3111	7487

Step 6: Comparison of Original vs Tuned Models

Metric	Original	Tuned	Class-Tuned Example
Accuracy	0.77	0.74	0.75
F1-score	0.75–0.78	0.74	0.75
ROC AUC	0.7686	0.8109	0.8226

Observations:

Hyperparameter tuning improved ROC AUC, indicating better class discrimination.
Precision and recall trade-offs shifted: tuned model favors the positive class slightly.
Maintaining a balanced dataset helped the model perform reliably across both Poor and Standard classes.
Overall, the tuned model generalizes better while controlling overfitting.

Step 7: Conclusion

Hyperparameter tuning in the Decision Tree model improved its ability to distinguish between Poor and Standard credit scores, despite a minor drop in overall accuracy. Using a balanced dataset, careful feature encoding, and confusion matrix analysis ensures that the predictive model can be trusted for real-world credit risk assessment.

Search This Blog