Home Credit – Credit Risk Model Stability

Category:

Finance & Risk Modelling

Skills:

LightGBM,Sklearn,XGBoost

Problem Context

The challenge was to predict loan defaults for over 1M applicants while ensuring the model remained accurate and stable across shifting demographics and time periods. Financial institutions require not just performance, but also reliability under regulatory scrutiny.

Data Collection

I gathered the Kaggle Home Credit dataset, a large-scale benchmark for credit risk modeling.

  • ~1M applicants, 300+ features

  • Data combined from demographic, financial, and credit bureau sources

Data Preparation

Engineered features to extract signal and ensured the dataset was clean enough for modeling.

  • Imputed ~40% missing values with median/mode + missingness flags

  • Created ratio features (Credit/Income, Annuity/Income → +0.03 AUC)

  • Applied k-means clustering (k=5) for behavioral segments

  • Reduced 300+ features to ~150 with recursive elimination

Baseline

A logistic regression served as a benchmark for interpretability.

  • AUC ~0.64 → too low for production, but useful for comparison

Modeling

Trained and tuned gradient boosting models to improve predictive power.

  • XGBoost: AUC 0.729 after hyperparameter tuning

  • LightGBM: AUC 0.742, faster training, better with sparse data

  • Explored transfer learning embeddings & reinforcement learning → minimal lift (<0.01 AUC)

Evaluation

Assessed not just accuracy but also long-term reliability of the models.

  • Metrics: AUC, Kolmogorov–Smirnov (KS), Population Stability Index (PSI)

  • Early models showed drift, especially across income groups

Refinement

Stabilized the final model to meet financial compliance standards.

  • Modified LightGBM loss function to penalize unstable splits

  • Reduced PSI from 0.32 → 0.14 (below industry threshold of 0.25)

Conclusion

The final LightGBM model balanced accuracy and stability, achieving AUC 0.742, KS 0.42, and PSI 0.14. This ensured strong predictive performance while meeting the stability demands of real-world financial applications.

Do you have any project idea you want to discuss about?