Purpose: This research aims to develop a practical and interpretable modeling framework that bridges predictive performance and transparency for SME loan default segmentation. Authors examine whether a calibrated LightGBM as a champion model, paired with an EBM challenger, can maximize predictive accuracy while meeting the transparency and compliance needs of lenders. The goal is to accurately classify loans into risk tiers (e.g., low, medium, high risk of default), while maintaining clarity in decision-making.
Methods: For the multi-tier classification, authors report confusion matrices and tier-wise performance (e.g., what fraction of actual defaults fell into High vs. Medium, etc.), but since the Medium tier is a derived category, they primarily focus on the calibrated probabilities and their alignment with outcomes rather than treating it as a separate ground-truth class. Model selection and hyperparameter tuning were performed via cross-validation on the training set. The LightGBM and EBM models were primarily optimized for ROC AUC. Authors also monitored calibration (via calibration plots) to ensure the LightGBM + isotonic pipeline was yielding well-calibrated probabilities. All results reported in the next section are on the unseen test set, simulating how the models would perform on new loan applications.
Findings: A calibrated Light Gradient Boosting Machine (LightGBM) achieves the highest performance (ROC-AUC 0.969), while an Explainable Boosting Machine (EBM) offers nearly equal accuracy (ROC-AUC 0.963) with full transparency. With observed default rates of 2.5%, 48.8%, and 89.7%, calibrated LightGBM probability outputs are used to determine risk tiers of Low, Medium, and High. Our results show that modern ensemble methods significantly outperform traditional models, and when paired with inherently interpretable alternatives like EBM, they provide both superior predictive power and regulatory-compliant explainability.
Implications: It would be valuable to test the framework on different datasets (e.g., LendingClub data, mortgage datasets, or non-U.S. SME loans) to ensure its robustness. A broad validation would strengthen confidence that a LightGBM–EBM approach generalizes well across credit contexts, or highlight what adjustments are needed (perhaps tuning hyperparameters or calibration differently).
Originality: A practical blueprint for SME credit risk management on commodity hardware, this LightGBM–EBM champion–challenger stack provides state-of-the-art accuracy, interpretable insights, and capital-efficient risk segmentation.
Minh Nguyen Hoang, Thota Sai Karthikeya, and Thota Sree Mallikharjuna Rao. Bridging Predictive Performance and Transparency: A Multi-Model Framework for Small-Business Loan Default Segmentation.
. 2025, 16, 1-17