Since the 1990s, credit risk evaluation methods have significantly improved prediction accuracy by using artificial intelligence models such as neural networks support vector machine (SVM). Neural network is widely used to forewarn against enterprise financial failure, which is a suitable model for nonlinear and non-normal conditions, and is not strict on data distribution. In general, its performance is superior to traditional statistical methods (West 2000); however, the neural network faces issues regarding training efficiency and convergence. Moreover, because of the small sample in supply chain financing, both traditional statistical methods as well as neural network models are inefficient in terms of SME credit risk assessment. So support vector machine technique has been applied in the credit risk assessment issue to process the small scale and high dimension data. Liu and Lin (2005) established a model based on SVM for credit risk assessment in commercial banks with an assessment index including eight financial indicators. Tang and Tan (2010) carried out a SVM model for the listed companies’ credit risk assessment and obtained a high classification accuracy. According to the previous research, plenty of results showed that those classification model based on BP neural network and SVM had remarkable ability to identify credit risk. Therefore, this paper introduces a support vector machine model to assess SME credit risk in SCF. In this paper, the assessment index will be built with both quantitative and qualitative indicators. So we need to conduct a comparison study between SVM model and BP neural network model.
Support vector machine model
Support vector machine technique (SVM) is a new pattern recognition technique developed by Dr.Vapnik and his research group (Cortes and Vapnik 1995). Within a few years since its introduction, the SVM has already been applied in various fields. As a kernel-based machine learning method, the SVM has significant advantages in solving nonlinear, separable classification problems (Vapnik 1995). Because SVM came from the generalized concept of optimal hyper plane with maximum margin between the two classes. Although intuitively simple, this idea actually implements the structural risk minimization (SRM) principle in statistical learning theory. Moreover, the learning strategy of the SVM can go beyond the two-dimensional plane by constructing a multi-dimensional decision-making surface to achieve optimal separation of two kinds of data with small empirical risks.
Although multi-dimensional classification is more complex than a two-dimensional classification, the principles of the two are very similar. Hence, we can illustrate the learning strategy of the SVM through the separable linear example shown in Fig. 1. First, we define the mean of two samples types, and then denote sample A with solid black dots and sample B with black vacant circles. H represents the sorting line that separates sample A from sample B (to most extent). H1 is parallel to the line H, which passes through sample points nearest to class (sample) A. Similarly, H2is parallel to the line H, which passes through sample points nearest to class (sample) B. The distance between H1 and H2 is called the classification interval. The SVM uses a linear separating hyper plane to produce the classifier with maximal margin, for the simplest binary classification task. Taking account of a two-class linear classifier problem, the task is praised as “optimal” separating hyper plane
according to the training sample set
$$ \left({x}_i,\kern0.5em {y}_i\right),\;i=1,2,\cdot \cdot \cdot, n,{x}_i\in {R}^d,{y}_i\in \left\{+1,-1\right\}, $$
(2)
satisfying the following equation
$$ {y}_i\left(w{x}_i+b\right)\ge 1,\;i=1,2,\cdot \cdot \cdot n $$
(3)
Classification interval margin = 2/‖w‖, the optimal hyperplane must satisfy equation (3) and ‖w‖2 minimization. The SVM classifier only depends on a small part of the training samples (SVs), which satisfy equation (3). Transformed by the Lagrangian function, the abovementioned problem can be transformed into the dual problem. Then, the optimal classification function is
$$ f(x)=\operatorname{sgn}\left\{\left({w}^{*}x\right)+\right.\left.{b}^{*}\right\}=\operatorname{sgn}\left\{{\displaystyle \sum_{i=1}^n{\alpha_i}^{*}}{y}_i\left({x}_ix\right)+{b}^{*}\right\} $$
(4)
In the former equation, Lagrange multipliers corresponding to each sample are expressed as α
i
*. b* is representative of the classification threshold, which can be calculated through the SVs. If the sample data is linearly inseparable, we can add ξ
i
≥ 0 to equation (3), where ξ
i
∈ R is the soft margin error of the training sample
$$ {y}_i\left(w{x}_i+b\right)\ge 1-{\xi}_i\;i=1,2,\cdots, n $$
(5)
Then, the objective function becomes
$$ {\displaystyle \underset{w,b,\xi }{ \min }}\varphi \left(w,x\right)=\frac{1}{2}{w}^Tw+C{\displaystyle \sum_{i=1}^n{\xi}_i} $$
(6)
This formula represents an optimal classification hyperplane, which can minimize the classification error rate and maximize the classification interval in the meantime. The regularization factor, parameter C > 0, has very key effect on balancing the importance between the maximization of the margin width and the minimization of the training error. In general, a nonlinear classification problem can be converted into a linear classification problem by an inner product kernel function K(x
i
, x
j
) (Liu and Lin 2005), and the classification function is in the form of
$$ f(x)=\operatorname{sgn}\left({\displaystyle \sum_{i=1}^n{\alpha_i}^{*}{y}_iK\left({x}_i,{x}_j\right)}+{b}^{*}\right) $$
(7)
Thus, the classification function can be used to identify bank credit risk by distinguishing high default risk enterprises from low default risk ones.
BP neural network model
Neural network theory, which is based on modern neuroscience and used for simplifying and simulating the cranial nerve system, obtain an abstract mathematical model. This model is designed to learn the training sample and to judge complex problems with uncertain information in complex environment through the variable structure regulating process of network. This theory has been used and promoted in several fields, like information processing, intelligence controlling and so on. The BP neural network is a multi-layer feed forward feedback network in one-way transmission, which includes performances like self-learning, self-organization and self-adaption, and it is widely used in multifactorial, nonlinear and uncertain problem of prediction and evaluation.
The BP neural network is composed of input layer, hidden layer and output layer. When it is given the structure of neural network and train with a certain amount of samples, the input values transmitted forward from the input layer to output layer, but if the output values and expected values do not attain the expected error precision, then it should be turn into the counter propagation procedure of error and be adjusted the weights and thresholds of network by the error value of each layer until it is attaining the error precision requirement. The adjustment of weight uses the learning algorithm of counter propagation (Wang et al. 2000), the transformation function of neuron is the S-pattern function
$$ f(x)=\frac{1}{1+{e}^{-x}} $$
(8)
It can achieve the arbitrary nonlinear mapping from input to output. The calculation process is shown as Fig. 2.
In the SMEs’ credit risk assessment model, credit risk evaluation index is the input vector of the neural network, and the credit rating data of small and medium enterprises is the output vector. Generally, in the input vector, whether it is the qualitative indexes or quantitative indexes, it is necessary to control it between [0, 1] by standardization. The target error and the hidden layer number of the model can be obtained by the method of cross validation.