From: A framework to improve churn prediction performance in retail banking
Authors | Context | Data preprocessing stages | Predictive models | Dataset size |
---|---|---|---|---|
Lemmens and Croux (2006) | Telecom | MVI IDT-Over IDT-Under | Logistic regression, bagging, and stochastic gradient boosting | Datasets 1 and 2: 51,306 customers Dataset 3: 100,462 customers |
Xie et al. (2009) | Banking | OT | Support Vector Machines | 2,382 customers |
Zhao and Dang (2008) | Banking | – | Random forests, neural networks, decision trees, and Support Vector Machines | 1,524 customers |
Benoit and Poel (2012) | Banking | – | Random forest | 244,787 customers |
Huang et al. (2012) | Telecom | FE | Logistic regression, Naive Bayes, linear classification, C4.5, neural networks, Support Vector Machines, and data mining by evolutionary learning (DMEL) | 827,124 customers |
Farquad et al. (2014) | Banking | IDT-Over IDT-Under FS | Support Vector Machines, Naive Bayes trees | 14,814 customers |
He et al. (2014) | Banking | IDT-Over IDT-Under | Logistic regression and Support Vector Machines | 46,406 customers |
Datta et al. (2015) | TV subscription | MVI FE | Binomial probit model | 16,512 customers |
Keramati et al. (2016) | Banking | OT MVI IDT-Over FS | Decision trees | 4,383 customers |
Geiler et al. (2022) | Banking and others | IDT-Over IDT-Under | KNN, Logistic regression, Naive Bayes, Support Vector Machines, Decision trees, neural networks, Random Forest, XGBoost | 16 datasets (average of 108,473 customers) |
Tékouabou et al. (2022) | Banking | MVI FS | Chandy-Misra-Bryant | 45,000 customers |
Our study | Banking | MVI FE IDT-Over IDT-Under | XGBoost and Elastic Net | 3,283,332 customers |