Information gain directed genetic algorithm wrapper feature selection for credit rating

Jadhav, Swati; He, Hongmei; Jenkins, Karl

doi:10.1016/j.asoc.2018.04.033

Information gain directed genetic algorithm wrapper feature selection for credit rating

Jadhav, Swati; He, Hongmei; Jenkins, Karl

Authors

Swati Jadhav

Prof Mary He H.He5@salford.ac.uk
Professor in A.I. for Robotics

Karl Jenkins

Abstract

Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making.

Regarding the classification accuracy, SVM accuracy is always better than KNN and NB for Baseline techniques, GAW and IGDFS. Also, we can conclude that the IGDFS achieved better performance than generic GAW, and GAW obtained better performance than the corresponding single classifiers (baseline) for almost all cases, except for the German Credit dataset, IGDFS + KNN has worse performance than generic GAW and the single classifier KNN. Removing features with low information gain could produce conflict with the original data structure for KNN, and thus affect the performance of IGDFS + KNN.

Regarding the ROC performance, for the German Credit Dataset, the three classic machine learning algorithms, SVM, KNN and Naïve Bayes in the wrapper of IGDFS GA obtained almost the same performance. For the Australian credit dataset and the Taiwan Credit dataset, the IGDFS + Naive Bayes achieved the largest area under ROC curves.

Journal Article Type	Article
Acceptance Date	Apr 13, 2018
Online Publication Date	Apr 22, 2018
Publication Date	May 29, 2018
Deposit Date	Jun 10, 2025
Journal	Applied Soft Computing
Print ISSN	1568-4946
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	69
Pages	541-553
DOI	https://doi.org/10.1016/j.asoc.2018.04.033

Capsule network with using shifted windows for 3D human pose estimation (2025)
Journal Article

A Novel Fuzzy Logic Framework for Model Reliability Evaluation in Permeability Prediction Using GPR (2024)
Presentation / Conference Contribution

Uncertainty-Aware Reservoir Permeability Prediction using Gaussian Processes Regression and NMR Measurements (2024)
Presentation / Conference Contribution

A Novel Framework for Reservoir Permeability Prediction Using GPR with Grey Relational Grades and Uncertainty Quantification (2024)
Presentation / Conference Contribution

Machine Learning in Oil and Gas Exploration: A Review (2024)
Journal Article

Information gain directed genetic algorithm wrapper feature selection for credit rating

Jadhav, Swati; He, Hongmei; Jenkins, Karl

Authors

Abstract

You might also like

Downloadable Citations