Credit Risk Prediction Using Decision Trees (3MTT Assignment Task 2)
This project focuses on developing a predictive model for classifying credit risk using a dataset of 100,000 customer records and 24 features. The goal is to predict whether a customer has a Poor or Standard credit score using Decision Tree classifiers.
Step 1: Import and Explore Dataset
The dataset includes features like Age, Occupation, Annual Income, Monthly Balance, Credit Mix, and Payment Behaviour. After exploring, there were no missing values.
The dataset is fairly balanced: 53% Standard and 47% Poor. A balanced dataset is crucial as it ensures that the model does not favor one class over the other and produces reliable predictions for both “Poor” and “Standard” credit scores.
Step 2: Transform Categorical Features
Categorical attributes such as Month, Occupation, Credit Mix, Payment of Minimum Amount, and Payment Behaviour were numerically encoded for the Decision Tree.
| Feature | Original Value | Encoded Value |
|---|---|---|
| Credit_Mix | Good | 1 |
| Credit_Mix | Standard | 2 |
| Credit_Mix | Poor | 3 |
Step 3: Original Decision Tree Model
The Decision Tree classifier was trained on 22 numeric features. Performance on the test set:
- Accuracy: 0.77
- F1-score: 0.75–0.78
- ROC AUC: 0.7686
Confusion Matrix:
| True \ Predicted | Poor | Standard |
|---|---|---|
| Poor | 7053 | 2349 |
| Standard | 2258 | 8340 |
Step 4: Hyperparameter Tuning
Parameters tuned:
- max_depth: [3, 5, 7]
- min_samples_leaf: [5, 10]
Best parameters found: max_depth = 7, min_samples_leaf = 10. This ensures the tree captures patterns without overfitting.
Step 5: Tuned Decision Tree Model
Performance on test set after tuning:
- Accuracy: 0.74
- F1-score: 0.74 (both classes)
- ROC AUC: 0.8109
Confusion Matrix:
| True \ Predicted | Poor | Standard |
|---|---|---|
| Poor | 7370 | 2032 |
| Standard | 3111 | 7487 |
Step 6: Comparison of Original vs Tuned Models
| Metric | Original | Tuned | Class-Tuned Example |
|---|---|---|---|
| Accuracy | 0.77 | 0.74 | 0.75 |
| F1-score | 0.75–0.78 | 0.74 | 0.75 |
| ROC AUC | 0.7686 | 0.8109 | 0.8226 |
Observations:
- Hyperparameter tuning improved ROC AUC, indicating better class discrimination.
- Precision and recall trade-offs shifted: tuned model favors the positive class slightly.
- Maintaining a balanced dataset helped the model perform reliably across both Poor and Standard classes.
- Overall, the tuned model generalizes better while controlling overfitting.
Step 7: Conclusion
Hyperparameter tuning in the Decision Tree model improved its ability to distinguish between Poor and Standard credit scores, despite a minor drop in overall accuracy. Using a balanced dataset, careful feature encoding, and confusion matrix analysis ensures that the predictive model can be trusted for real-world credit risk assessment.
Comments
Post a Comment