Skip to main content

Table 1 Previous research assessing churn prediction models (specifications of our study are presented in the bottom line for comparison purposes)

From: A framework to improve churn prediction performance in retail banking

Authors

Context

Data preprocessing stages

Predictive models

Dataset size

Lemmens and Croux (2006)

Telecom

MVI

IDT-Over

IDT-Under

Logistic regression, bagging, and stochastic gradient boosting

Datasets 1 and 2: 51,306 customers

Dataset 3: 100,462 customers

Xie et al. (2009)

Banking

OT

Support Vector Machines

2,382 customers

Zhao and Dang (2008)

Banking

Random forests, neural networks, decision trees, and Support Vector Machines

1,524 customers

Benoit and Poel (2012)

Banking

Random forest

244,787 customers

Huang et al. (2012)

Telecom

FE

Logistic regression, Naive Bayes, linear classification, C4.5, neural networks, Support Vector Machines, and data mining by evolutionary learning (DMEL)

827,124 customers

Farquad et al. (2014)

Banking

IDT-Over

IDT-Under

FS

Support Vector Machines, Naive Bayes trees

14,814 customers

He et al. (2014)

Banking

IDT-Over

IDT-Under

Logistic regression and Support Vector Machines

46,406 customers

Datta et al. (2015)

TV subscription

MVI

FE

Binomial probit model

16,512 customers

Keramati et al. (2016)

Banking

OT

MVI

IDT-Over

FS

Decision trees

4,383 customers

Geiler et al. (2022)

Banking and others

IDT-Over

IDT-Under

KNN, Logistic regression, Naive Bayes, Support Vector Machines, Decision trees, neural networks, Random Forest, XGBoost

16 datasets (average of 108,473 customers)

Tékouabou et al. (2022)

Banking

MVI

FS

Chandy-Misra-Bryant

45,000 customers

Our study

Banking

MVI

FE

IDT-Over

IDT-Under

XGBoost and Elastic Net

3,283,332 customers