Navigating Data Imbalances in Credit Risk Management: A One-Sided Selection Approach
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Credit scoring plays a vital role in mitigating the information asymmetry that is pervasive on platforms for peer-to-peer (P2P) lending. A considerable challenge stems from the disparity in loan repayment outcomes: a significant minority of loan applicants defaulting on their loans, while the majority fulfilling their repayment obligations. The presence of imbalance in the dataset has the potential to incorporate bias into predictive model, which could lower its performance. In order to address this issue, data balancing techniques are often employed to enhance the performance of credit scoring models through the generation of datasets that are more balanced. This work constructs a robust credit scoring model capable of precisely assessing the creditworthiness of individuals seeking P2P lending. Four distinct classifiers - Logistic Regression, Random Forest, LightGBM, and Support Vector Machine (SVM) are employed. In doing so, it effectively mitigates the distortions that can result from unbalanced data distributions. This work achieves data balance with One-Sided Selection methodology along with Information gain and Pearson correlation which mainly determine the features to include. The proposed model thus works on both balanced and unbalanced datasets. Experimental results show that the standard metrics like accuracy, precision, recall, and F1-Score achieves upto 90.41%, 89.51%, 90.40%, and 89.96%, respectively. © 2024 IEEE.
Description
Keywords
Credit Scoring, Data Imbalance, Information Gain, One-Sided Selection, Pearson Correlation
Citation
2024 Control Instrumentation System Conference: Guiding Tomorrow: Emerging Trends in Control, Instrumentation, and Systems Engineering, CISCON 2024, 2024, Vol., , p. -
