Home Credit – Credit Risk Model Stability
Category:
Finance & Risk Modelling
Skills:
LightGBM,Sklearn,XGBoost
Problem Context
The challenge was to predict loan defaults for over 1M applicants while ensuring the model remained accurate and stable across shifting demographics and time periods. Financial institutions require not just performance, but also reliability under regulatory scrutiny.
Data Collection
I gathered the Kaggle Home Credit dataset, a large-scale benchmark for credit risk modeling.
~1M applicants, 300+ features
Data combined from demographic, financial, and credit bureau sources
Data Preparation
Engineered features to extract signal and ensured the dataset was clean enough for modeling.
Imputed ~40% missing values with median/mode + missingness flags
Created ratio features (Credit/Income, Annuity/Income → +0.03 AUC)
Applied k-means clustering (k=5) for behavioral segments
Reduced 300+ features to ~150 with recursive elimination
Baseline
A logistic regression served as a benchmark for interpretability.
AUC ~0.64 → too low for production, but useful for comparison
Modeling
Trained and tuned gradient boosting models to improve predictive power.
XGBoost: AUC 0.729 after hyperparameter tuning
LightGBM: AUC 0.742, faster training, better with sparse data
Explored transfer learning embeddings & reinforcement learning → minimal lift (<0.01 AUC)
Evaluation
Assessed not just accuracy but also long-term reliability of the models.
Metrics: AUC, Kolmogorov–Smirnov (KS), Population Stability Index (PSI)
Early models showed drift, especially across income groups
Refinement
Stabilized the final model to meet financial compliance standards.
Modified LightGBM loss function to penalize unstable splits
Reduced PSI from 0.32 → 0.14 (below industry threshold of 0.25)
Conclusion
The final LightGBM model balanced accuracy and stability, achieving AUC 0.742, KS 0.42, and PSI 0.14. This ensured strong predictive performance while meeting the stability demands of real-world financial applications.