Credit Risk Prediction Using Decision Trees (3MTT Assignment Task 2)

This project focuses on developing a predictive model for classifying credit risk using a dataset of 100,000 customer records and 24 features. The goal is to predict whether a customer has a Poor or Standard credit score using Decision Tree classifiers.


Step 1: Import and Explore Dataset

The dataset includes features like Age, Occupation, Annual Income, Monthly Balance, Credit Mix, and Payment Behaviour. After exploring, there were no missing values.

The dataset is fairly balanced: 53% Standard and 47% Poor. A balanced dataset is crucial as it ensures that the model does not favor one class over the other and produces reliable predictions for both “Poor” and “Standard” credit scores.


Step 2: Transform Categorical Features

Categorical attributes such as Month, Occupation, Credit Mix, Payment of Minimum Amount, and Payment Behaviour were numerically encoded for the Decision Tree.

Feature Original Value Encoded Value
Credit_Mix Good 1
Credit_Mix Standard 2
Credit_Mix Poor 3

Step 3: Original Decision Tree Model

The Decision Tree classifier was trained on 22 numeric features. Performance on the test set:

  • Accuracy: 0.77
  • F1-score: 0.75–0.78
  • ROC AUC: 0.7686

Confusion Matrix:

True \ Predicted Poor Standard
Poor 7053 2349
Standard 2258 8340

Step 4: Hyperparameter Tuning

Parameters tuned:

  • max_depth: [3, 5, 7]
  • min_samples_leaf: [5, 10]

Best parameters found: max_depth = 7, min_samples_leaf = 10. This ensures the tree captures patterns without overfitting.


Step 5: Tuned Decision Tree Model

Performance on test set after tuning:

  • Accuracy: 0.74
  • F1-score: 0.74 (both classes)
  • ROC AUC: 0.8109

Confusion Matrix:

True \ Predicted Poor Standard
Poor 7370 2032
Standard 3111 7487

Step 6: Comparison of Original vs Tuned Models

Metric Original Tuned Class-Tuned Example
Accuracy 0.77 0.74 0.75
F1-score 0.75–0.78 0.74 0.75
ROC AUC 0.7686 0.8109 0.8226

Observations:

  • Hyperparameter tuning improved ROC AUC, indicating better class discrimination.
  • Precision and recall trade-offs shifted: tuned model favors the positive class slightly.
  • Maintaining a balanced dataset helped the model perform reliably across both Poor and Standard classes.
  • Overall, the tuned model generalizes better while controlling overfitting.

Step 7: Conclusion

Hyperparameter tuning in the Decision Tree model improved its ability to distinguish between Poor and Standard credit scores, despite a minor drop in overall accuracy. Using a balanced dataset, careful feature encoding, and confusion matrix analysis ensures that the predictive model can be trusted for real-world credit risk assessment.

Comments

Popular posts from this blog

From DSRC to 5G NR-V2X: The Road Ahead for Connected Vehicles

CTE 311: ENGINEER IN SOCIETY: CURRICULUM (20/21 SESSION)