Skip to main content

Clues from networks: quantifying relational risk for credit risk evaluation of SMEs

Abstract

Owing to information asymmetry, evaluating the credit risk of small- and medium-sized enterprises (SMEs) is difficult. While previous studies evaluating the credit risk of SMEs have mostly focused on intrinsic risk generated by SMEs, our study considers both intrinsic and relational risks generated by neighbor firms’ publicly available risk events. We propose a framework for quantifying relational risk based on publicly available risk events for SMEs’ credit risk evaluation. Our proposed framework quantifies relational risk by weighting the impact of publicly available risk events of each firm in an interfirm network—considering the impact of interfirm network type, risk event type, and time dependence of risk events—and combines the relational risk score with financial and demographic features to evaluate SMEs credit risk. Our results reveal that relational risk score significantly improves both discrimination and granting performances of credit risk evaluation of SMEs, providing valuable managerial and practical implications for financial institutions.

Introduction

Small- and medium-sized enterprises (SMEs) critically support national economies, being main contributors to employment and economic development in many countries (OECD 2020). However, SMEs have difficulty obtaining the credit loans needed for their growth and development. Most SMEs are private firms, and their financial information is less transparent and reliable than that of large firms, which are, in general, publicly traded. Hence, financial institutions have difficulty conducting effective evaluations of SMEs’ credit risk. Consequently, financial institutions are prone to credit rationing (i.e., limiting the supply of additional credit to SMEs requiring funds) even if the latter are willing to pay higher interest rates (Stiglitz and Weiss 1981; Murro and Peruzzi 2019). To solve SMEs’ financing difficulties, helping financial institutions effectively evaluate the credit risk of SMEs is crucial.

To this end, many studies focus on extracting features from various factors to improve credit risk evaluation performance (e.g., Altman 2010; Yin et al. 2020). Commonly used features include demographic, financial, and credit history features (e.g., Abdou et al. 2016; Djeundje et al. 2021), which predominantly depict the intrinsic risk profile of SMEs. However, some studies demonstrate that a firm’s risk can be affected by that of its neighbor firms (e.g., Giesecke and Weber 2004; Beaver et al. 2019). Simultaneously, SMEs have weak anti-risk capability and are vulnerable to external factors; hence, SMEs are more susceptible to the risks of their neighbor firms (OECD 2016). Accordingly, to effectively evaluate SMEs’ credit risk, both intrinsic risk arising from SMEs and relational risk arising from SMEs’ neighbor firms must be considered. We refer to the impact of a firm’s neighbor firms’ risk on the risk of that firm as relational risk.

Considering that relational risk is essential in the credit risk evaluation of SMEs, and the network approach greatly contributes to the understanding of the interdependency between firms (Zha et al. 2020), several studies use network analysis to explore relational risk for credit risk evaluation of SMEs. Some studies extract statistical features related to relational risk (e.g., number of neighbor firms and defaulted neighbor firms) from interfirm networks and incorporate these features in credit risk evaluation (e.g., Vinciotti et al. 2019; Letizia and Lillo 2019). These features consider only the distribution of neighbor firms, but they rarely capture the impact of a firm’s neighbor firms’ risk on the risk of that firm. Other studies use historical defaults of each firm in an interfirm network to quantify relational risk for credit risk evaluation of SMEs (i.e., quantifying the impact of credit history information of a firm’s neighbor firms on the credit risk of that firm) (e.g., Calabrese et al. 2019; Óskarsdóttir and Bravo 2021). In our study, quantifying relational risk based on each firm’s credit history information in an interfirm network for credit risk evaluation refers to quantifying relational risk based on homogeneous information. In turn, this implies that the credit history information of each firm in an interfirm network is known. However, obtaining the credit history information of each firm in an interfirm network faces cost and access issues, since credit history information is either purchased (e.g., in the United States or Germany)Footnote 1 or not publicly available owing to the lack of credit information sharing systems (e.g., in some developing countries) (Fosu et al. 2020; Saruni and Koori 2020). As existing methods of quantifying relational risk based on homogeneous information suffer from both access and cost issues, a novel methodology for quantifying relational risk for credit risk evaluation of SMEs is required.

In this study, we present a method for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs. Specifically, we quantify relational risk based on the publicly available risk events (instead of credit history information) of each firm in an interfirm network for credit risk evaluation. With the development of the government information disclosure system, some risk events of firms are disclosed by government agencies and freely accessible to everyone (e.g., legal judgments, administrative penalties) (Henninger 2013; Medvedeva et al. 2020). We refer to such risk events disclosed by government agencies as publicly accessible risk events. Thus, these publicly available events can be considered reliable while freely and easily accessible. When quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs, we aim to answer the following three research questions. (1) Does the relational risk quantified based on heterogeneous information improve the performance of credit risk evaluation of SMEs? (2) Does network type affect the effectiveness of relational risk in credit risk evaluation of SMEs? (3) Does network type and risk event type simultaneously affect the effectiveness of relational risk in the credit risk evaluation of SMEs?

Aiming to quantify relational risk to improve the performance of credit risk evaluation of SMEs and to map to the logic of the preceding three questions, we propose a framework for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs. Our framework considers key elements of relational risk quantified based on heterogeneous information (i.e., network type, risk event type, and time dependence of risk events). In the proposed framework, we assign a time-dependent risk score to each firm in an interfirm network based on that firm’s involvement in publicly available risk events. We then use a smoothed version of the weighted-vote relational neighbor (wvRN) algorithm to generate a score for each firm quantifying relational risk by weighting time-dependent risk scores of neighbor firms in an interfirm network. Finally, we combine the relational risk score with financial and demographical features to evaluate the credit risk of SMEs.

We evaluate our proposed framework using two datasets, namely, the manufacturing SMEs dataset and National Equities Exchange and Quotations (NEEQ) SMEs dataset (Wang et al. 2020a, b). We evaluate the utility of relational risk quantified based on heterogeneous information in credit risk evaluation of SMEs in terms of discrimination performance (i.e., the ability the ability to distinguish bad loans from good loans) and granting performance (i.e., the ability to grant loans at a low default rate). Our results indicate that incorporating relational risk based on heterogeneous information can significantly improve discrimination performance. Moreover, we found that granting performance of credit risk evaluation of SMEs and network type, risk event type, and time dependence of risk events affect the effectiveness of the relational risk in credit risk evaluation of SMEs.

Our study provides the following contributions to research and practice. First, our study presents a novel methodology for quantifying relational risk. While existing studies quantify relational risk using the credit history information of each firm in the network (e.g., Calabrese et al. 2019; Óskarsdóttir and Bravo 2021), our study quantifies relational risk using the publicly available risk events of each firm in the network. Second, we propose a framework for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs. Our framework introduces the impact of network type, risk event type, and time dependence of risk events, which are not simultaneously introduced in the framework of quantifying relational risk based on homogeneous information for credit risk evaluation (e.g., Van Vlasselaer et al. 2016; Tobback et al. 2017). Third, we provide a comprehensive analysis of whether relational risk based on heterogeneous information can improve the performance of credit risk evaluation of SMEs. Hence, we provide guidance for financial institutions.

Literature review

Two main research streams on credit risk evaluation (i.e., loan default prediction) are available. The first focuses on credit risk evaluation methods, which classify loan applications as default or nondefault (e.g., Angilella and Mazzù 2015; Figini et al. 2017; Maldonado et al. 2020), and the second focuses on exploring effective features related to credit risk. Meanwhile, the features can be divided into two types, wherein the first comprises basic features depicting the intrinsic risk of borrowers (e.g., Yin et al. 2020; Stevenson et al. 2021) and the second comprises relational risk features depicting the relational risk arising from borrowers’ neighbors (e.g., Vinciotti et al. 2019; Kou et al. 2021). Since we aim to quantify relational risk using the network approach and incorporate quantified relational risk to improve the performance of credit risk evaluation of SMEs, we review related literature as follows: (1) credit risk evaluation methods, (2) basic features in credit risk evaluation, and (3) interfirm networks and relational risk.

Credit risk evaluation methods

Several studies on credit risk evaluation methods have been made to effectively evaluate the credit risk of firms in both industry and academia. For example, in the industry, credit rating agencies (e.g., S&P, Moody, and Fitch) are key in the credit market as they develop credit rating models to assess firms’ creditworthiness and provide these credit ratings for market participants in decision-making (e.g., Wojewodzki et al. 2018; Wu et al. 2022). In academia, Altman (1968) developed the Z-score model based on multivariate discriminant analysis for bankruptcy prediction, which was then applied to credit risk evaluation. Based on Altman’s work, researchers designed several representative credit risk evaluation models based on statistical methods, including linear discriminant analysis (LDA) and logistic regression (LR) (Abdou et al. 2016; do Prado et al. 2016). Currently, focus has shifted toward machine learning models, including support vector machines (SVM), neural networks, random forest (RF), and eXtreme gradient boosting (XGB) (Abellán and Castellano 2017).

Recently, the development of large neural networks with many layers of neurons (i.e., deep learning), has become popular in credit risk evaluation owing to its competitive computing and discrimination performance (Gunnarsson et al. 2021). However, deep learning models have low interpretability owing to their black box nature. In this study, we use LR, RF, and XGB for credit risk evaluation owing to their common usage (Lessmann et al. 2015).

Basic features in credit risk evaluation

To evaluate credit risk, researchers typically gather from firms’ own information and extract basic features, which are mainly divided into three groups: demographic, financial, and credit history. Demographic features mainly describe a firm’s essential attributes, including age, size, industry type, geographical location, character of management team, and percentage of insider ownership (Altman et al. 2010; Cassar et al. 2015). Financial features mainly describe a firm’s profitability, leverage, liquidity, efficiency, and growth opportunity (e.g., current ratio, return on assets ratio, quick ratio, and assets turnover ratio) (Angilella and Mazzù 2015; Liang et al. 2016). Credit history features mainly describe a firm’s debt and debt repayment over a period. History of trustworthiness and expectations of continued performance demonstrate a borrower’s ability to pay, including number of banks that lent money to the firm, number of short-term loans, and historical defaults (Bai et al. 2019; Djeundje et al. 2021).

The studies mentioned demonstrate the effectiveness of the basic features in SMEs’ credit risk evaluation. However, some studies highlight that SMEs’ information, especially financial information, is not always reliable (Altman et al. 2010; Cassar et al. 2015). Additionally, some studies find that firm risk not only depends on its own information but is also affected by information of its neighbor firms (Orton et al. 2015; Beaver et al. 2019). Therefore, considering only the basic features of SMEs when evaluating their credit risk is insufficient.

Interfirm networks and relational risk

The network approach has become an important tool not only for modeling and describing complex systems of interacting entities but also for facilitating understanding and analyzing complex system structure, trust propagation, and contagion risk (Buldyrev et al. 2010; Zha et al. 2020). Accordingly, using a network approach to study relational risk in an interfirm network is common (i.e., analyzing how a firm’s neighbor firms’ risk affects the risk of that firm). Currently, the network approach primarily demonstrates the existence of relational risk in a network or identifies mechanisms in different networks. For example, Elliott et al. (2014) construct a network of organizations based on cross-holding relationships and then identify how the network propagates discontinuous changes in asset values triggered by failures. Wang et al. (2019) construct a dynamic risk contagion model of interfirm credit guarantee network and study the dynamic risk contagion mechanism in the network. Grant and Yung (2021) infer the firm networks by estimating a vector autoregression of daily equity returns and find that network connections can function as conduits for contagion and elevated systemic risk.

Several studies use the network approach to incorporate relational risk features into the credit risk evaluation of SMEs. Vinciotti et al. (2019) construct an interfirm network based on transaction data and incorporate some features related to the network into credit risk evaluation of SMEs (e.g., number of companies from which transactions are received) and percentage of failed companies in the first-order neighborhood. Similarly, Letizia and Lillo (2019) construct an interfirm network based on payment data and only use network properties (e.g., degree, community, etc.) for credit risk evaluation of SMEs. However, these features only depict the distribution of neighbor firms but rarely capture how a firm’s neighbor firms’ risk affect the risk of that firm. Moreover, these features are based on an explicit transaction network and do not incorporate other network types. Only a few of the recent studies have focused directly on how to incorporate relational risk features into credit risk evaluation of SMEs. Calabrese et al. (2017) incorporate interdependence among London small business defaults into a credit risk analysis framework and demonstrate that incorporating the interdependence could improve the performance of credit risk analysis. Óskarsdóttir and Bravo (2021) develop a personalized multilayer PageRank centrality measure that ranks firms in the multilayer network according to a set of defaulted firms to understand relational risk and demonstrate that incorporating the measure could improve the performance of credit risk evaluation in agricultural lending. Notably, these studies assume that the credit history information of each firm in the interfirm network is known. As accessing the credit history information of each firm in the network incurs access and cost issues, the applicability of these studies is limited.

Overall, previous studies have mainly used basic features for credit risk evaluation of SMEs or only demonstrate the existence of relational risk in different interfirm networks. Studies have rarely focused on incorporating relational risk features in credit risk evaluation of SMEs. Moreover, owing to access and cost issues, existing studies on incorporating relational risk features into credit risk evaluation of SMEs have limited applicability. Hence, our study presents a novel methodology for quantify relational risk (i.e., quantifying relational risk based on heterogeneous information) and proposing a framework to incorporate quantified relational risk into credit risk evaluation of SMEs.

Research design

In this section, we begin with a brief introduction into our conceptual framework and then go into detail about each part in the framework.

Conceptual framework

We propose a framework for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs. Figure 1 illustrates the difference between the existing methods of quantifying relational risk based on homogeneous information and our method of quantifying relational risk based on heterogeneous information. Figure 1b shows that existing methods quantify relational risk (i.e., quantifying the impact of credit history information of firms (B–F) on the credit risk of firm A). We quantify relational risk (i.e., quantifying the impact of risk event information of firms (B–F) on the credit risk of firm A) as indicated in Fig. 1c. This difference raises our three research questions. Accordingly, our framework considers key considerations related to the relational risk quantified based on heterogeneous information (i.e., network type and risk event type). Furthermore, as risks events are dependent on time, our framework also considers time dependence of risk events.

Fig. 1
figure 1

Interfirm network and relational risk. a shows an example of an inter-firm network, in which node A represents a target firm, and nodes B–F represent neighbor firms of A. b and c show how to quantify A’s relational risk using credit histories of neighbor firms and risk events of neighbor firms, respectively

Figure 2 shows that the framework includes: network construction, risk identification, relational risk quantification, and relational risk application. We first construct two types of interfirm networks (i.e., shareholding and governance networks). Second, each firm in the network is assigned a risk score based on its involvement with publicly available risk events (e.g., administrative penalty or loan dispute). Third, as interfirm network is unstructured and cannot be directly added, we use a smoothed version of the wvRN algorithm to generate a score that quantifies relational risk by weighting the time-dependent risk scores of firms based on the network structure (Tobback et al. 2017). Particularly, the wvRN algorithm includes several types of weight functions. Finally, we combine the relational risk score with basic features to evaluate the credit risk of SMEs. The following is a summary of the four phases in the framework:

  1. 1.

    Construct interfirm networks.

  2. 2.

    Identify the risk of each firm in the interfirm network depending on its involvement in administrative penalties or loan disputes.

  3. 3.

    Produce a score to quantify relational risk using the smoothed version of wvRN algorithm, with the selection of weight function and time of risk events.

  4. 4.

    Add the relational risk score into the credit risk evaluation methods of SMEs.

Fig. 2
figure 2

Framework for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs

Network construction

We use a bipartite graph to construct interfirm networks. The bipartite graph is a graph whose nodes can be divided into two disjointed and independent sets in that every link connects a node in one set to a node in another set. Resources and firms can form a bipartite graph, where a link connects a firm and a resource if the firm has this resource. Figure 3a indicates that two firms are linked if they share at least one resource, including a manager, employee, buyer, address.

Fig. 3
figure 3

A bipartite graph and a unipartite graph. The circular nodes A–E represent firms and the square nodes 1–5 represent resources

To construct an interfirm network, the bipartite graph is compressed into a projection graph, wherein two firms are linked if and only if they have at least one common resource, as shown in Fig. 3b. To incorporate information about shared resources into the edge between firms in the projection graph, we determine the weight of the edges between two firms by combining weights of the shared resources. We adopt several weight functions commonly used to weight a resource (Tobback et al. 2017) (Table 1). Most weight functions are based on the concept of micro-affinity, that is, down weighing nodes (i.e., resources) with a high degree. A firm’s manager (as a common “resource” among a small number of firms) has a greater impact on the linked firms than if the manager is shared among a larger number of firms, as a manager shared between many firms has limited time and energy for any one individual firm. Consequently, based on the weight function of resources, weight wij between firms i and j is determined by aggregating the weights, sk, of the shared resources, i.e., \(w_{ij} = \sum\nolimits_{k \in N\left( i \right) \cap N\left( j \right)} {s_{k} }\). Multiple candidates for sk (see Table 1) and N(i) is the set of neighbor nodes for node i. The weight, wij, is considered in the method of quantifying relational risk (see “Risk identification” section for more details).

Table 1 Weight functions

As firms are linked through shared resources, different resources can form different network structures. For example, if two firms share an address, a geographical relationship exists between the two firms. Hence, a geographical network is formed when more firms are linked by geographical relationships. In our framework, we consider the shareholding and governance relationships to construct and define two types of interfirm network—shareholding and governance networks. Consequently, we elaborate on each network in greater detail.

The shareholding network has edges representing common shareholders between two firms—an edge between two firms exists if they have at least one common shareholder. The shareholding network can be used to predict cascading financial distress between firms (Giesecke and Weber 2006). Essentially, the financial distress of a firm can lead to neighbor firms incurring financial liquidity problems or credit risk through the ownership relationship.

The governance network has edges representing the governance relationship between firms, which is characterized by shared managers (or directors) among firms, including an interlocking directorate. The governance network can reflect a manager’s ability and a firm’s performance. Specifically, the manager impacts firm performance, and an incompetent or fraudulent manager may lead a firm into default or bankruptcy. Two firms are more likely to have similar performance if they share a manager, and if they share more managers, the performance will become even more similar.

Risk identification

To study the effect of the relational risk based on heterogeneous information on credit risk evaluation (i.e., whether the credit risk of a firm could be influenced by risk events of its neighbor firms), we must define risk events. We select two types of publicly available risk event (i.e., administrative penalty and loan dispute) to identify the risk of each firm in the network and then use the propensity of risk events of a firm’s neighbor firms, in either the shareholding or governance network, to determine the firm’s relational risk. We now discuss two risk events in greater detail.

Loan dispute constitutes a valuable repository of firm risk information. When a firm is designated as a debtor in a loan dispute and must compensate all associated debts, the situation reflects the firm’s problems of financial liquidity and its ability to repay. Additionally, a firm can be found guilty of not honoring contractual financial obligations, reflecting its willingness or ability to repay. Hence, a firm indicating signs of financial distress is more likely to default if it is related, as defined in the interfirm network structures we consider, to several firms that have loan disputes.

Administrative penalty includes disciplinary warnings, fines, confiscation of illegal gains, orders for suspension of production or use, rescission of permits or other certificates of similar character, and other types of administrative penalties. A firm could be subject to an administrative penalty owing to environmental pollution, misleading advertising, faulty products, and so on. Hence, administrative penalties are usually quite negative. For instance, environmental penalties may result in fines and factory closures pending compliance with environmental regulations, which affect the firm’s ability to repay any debts. Some penalties reflect on or result in a firm’s bad reputation (e.g., misleading advertising or faulty products). As a bad reputation affects consumers’ and partners’ awareness of the firm—thus having a negative effect on the economic benefit of the firm—a firm that has signs of financial distress is more likely to default on its financial obligations if related to several firms with administrative penalties and thus bad reputations.

Relational risk quantification

To add relational risk into the credit risk evaluation methods, unstructured interfirm network data must be transformed to structured data, which can quantify relational risk with a single scalar value. Additionally, interpretability of relational risk is instrumental in the credit risk evaluation of SMEs. Therefore, the process of quantifying relational risk must be comprehensible and easily implementable. The wvRN algorithm is a simple collective inference method that utilizes network structure to estimate how a node is affected by its neighbor nodes in a network (Stankova et al. 2020). The smoothed version of the wvRN algorithm fits the credit risk evaluation methods better (Tobback et al. 2017). Hence, we adopt the smoothed version of the wvRN algorithm to calculate a score quantifying the relational risk for each firm in the network. We now briefly describe the original wvRN algorithm and then the smoothed version.

Let wij denote the weight associated with the link between firms i and j. Let Z denote the normalization factor, equal to \(\sum\nolimits_{j \in N\left( i \right)} {w_{ij} }\). Let \(p(L_{i} = c|N\left( i \right))\) denote the probability score that the label Li of firm i equal the class c given its neighbors N(i) in the network. For example, in our work, class c depends on whether risk events are involved, that is, \(L_{i} = \left\{ {Risk\,Event\,Occurs,No\,Risk\,Event\,Occurs} \right\}\) and,

$$ p(L_{i} = c|N\left( i \right)) = \frac{{\mathop \sum \nolimits_{j \in N\left( i \right)} w_{ij} \cdot p(L_{j} = c|N\left( j \right))}}{Z} $$
(1)

According to Eq. (1), the resulting probability score \(p(L_{i} = c|N\left( i \right))\) (i.e., the probability score that the firm i belongs to class c) is equal to a weighted average of the probability scores of its neighbor firms belonging to class c.

The weight of the link between firms i and j as the aggregation of the weights, sk, of all shared resources, k, between firms i and j. Hence, \(w_{ij} = \sum\nolimits_{k \in N\left( i \right) \cap N\left( j \right)} {s_{k} }\). Table 1 reports weight functions of resource k. The neighbor firm’s probability score, that is, \(p(L_{j} = c|N\left( j \right))\), is set to 0 or 1 depending on whether this neighbor firm j has a historical risk event. Each firm in the network will have an updated probability score based on its neighbor firms’ probability scores. This score captures our supposition that the probability of a risk event at a firm is determined by the weighted average of the equivalent probabilities of that firm’s neighbors. Our framework uses the resulting probability score to quantify relational risk. Additionally, we consider the following three causes:

  1. 1.

    If a firm has no neighbor firms, the weight of the link between firms is set to zero. Hence, the normalizing constant is zero, and the fraction in Eq. (1) is undefined.

  2. 2.

    If a firm only has one neighbor firm, and the neighbor firm has a risk event, then that one neighbor firm with the risk event will potentially skew the firm’s probability too high as the population propensity of risk events is low.

  3. 3.

    Similarly, if a firm only has one type of neighbor firm, if all the neighbor firms are involved in risk events, the neighbor firms’ failures will result in too much penalization for the target firm when using the wvRN (Eq. (1)). Similarly, a firm’s relational risk is undetected if all of its neighbor firms (of one type) have not been involved in risk events according to Eq. (1).

The smoothed version of the wvRN introduces μc, which is equal to the incidence rate of class c in the entire dataset, to assign a boundary value μc to the firm with no neighbor firms, only one neighbor firm, or only one type of neighbor firm. Equation (2) provides the smoothed version of the wvRN. In the following empirical evaluation, we use the smoothed version of the wvRN to calculate a resulting probability score for each firm in the network, which is used to quantify the relational risk of each firm:

$$ p(L_{i} = c|N\left( i \right)) = \frac{{\mathop \sum \nolimits_{j \in N\left( i \right)} w_{ij} \times p(L_{j} = c|N\left( j \right)) + 2\mu_{c} }}{Z + 2} $$
(2)

To quantify relational risk, a hyperparameter other than the weight function (i.e., the time of the risk event) must be considered. Given that a firm may have multiple risk events that may influence a firm over a period, and to our knowledge, the ideal observation time is not known, conducting a pre-test for different values of the observation time is necessary. Relational risk is finally determined with the selection of weight function and time of risk event.

Empirical evaluation

Data for empirical evaluation were collected from a Chinese regional commercial bank and an enterprise information inquiry website called Qichacha.Footnote 2 We extracted and chose features from borrowing firms historically used to evaluate the credit risk of SMEs, including demographic and financial features. We collected the information required for constructing the interfirm network and quantifying relational risk from Qichacha.

Data

We evaluated our framework using two datasets from China (i.e., manufacturing SMEs dataset and NEEQ SMEs dataset). The large number of SMEs in China, especially default SMEs, provides substantial samples to reflect differences between default and nondefault SMEs, allowing to learn reliable knowledge of relational risk. According to the Law of the People’s Republic of China on the Promotion of Small and Medium-Sized Enterprises (2017), the definition of SME is based on the industry, operating income, and number of employees. A manufacturing firm in China is defined as an SME when its own operating income is between 2 and 400 million RMB and the number of its own employees is between 20 and 1000. Besides, according to the default definition in the Basel II Capital Accords, if a credit loan has been in arrears for more than 90 days, this is identified as a default credit loan. Ultimately, the first Chinese manufacturing SMEs dataset comprises a sample of credit loans from 2136 manufacturing SMEs during 2016 and 2017, of which 136 default loans, with a default rate of 6.7%. Table 7 in Appendix A presents the summary statistics of SMEs in terms of both operating income and the number of employees.

The NEEQC SMEs dataset is from the National Equities Exchange and Quotations (NEEQ) market, which serves mostly innovative and growth SMEs. NEEQ firms eligible for listing can be transferred directly to the main board market (Wang et al. 2020a, b). “Default” in the NEEQ SMEs dataset is in accordance with the definition of special treatment firms in NEEQ market (i.e., a firm is defined as a special treatment firm when its negative net asset at the end of a fiscal year is negative) (Jiang et al. 2021). Accordingly, this dataset covers 7943 SMEs between 2018 and 2020, of which 406 firms are identified as default, with a 5.1% default rate.

Our study shows that basic features for credit risk evaluation of SMEs include demographic features (i.e., firm age, district, and registered capital), risk events of firms themselves (i.e., loan disputes and administrative penalties), and financial features (i.e., financial ratios). Age and district are usually considered when determining SME credit risk (Fernandes and Artes 2016). For risk events, Altman and Wilson (2010) and Yin et al. (2020) have demonstrated that a legal judgment can effectively evaluate the credit risk of SMEs. We considered several financial ratios commonly used in the credit risk evaluation of SMEs, including the current, debt to assets, receivable turnover, inventory turnover, assets turnover, quick ratio, and operating profit ratios; return on equity (ROE); and return on assets (ROA) (Abdou et al. 2016). As discussed earlier, SMEs’ financial information is of lower quality compared to large companies. Hence, we have performed winsorization for all financial ratios. Specifically, we establish the values of the top and bottom 1% of each financial ratio to the means of their k-nearest neighbors, which are identified by the k-nearest neighbor algorithm. Risk events of target firms themselves are equal to 0 or 1 depending on whether the target firm has been involved in a loan dispute or administrative penalty. Table 8 in Appendix 1 and Table 13 in Appendix 2 summarize the descriptive statistics of basic features on the two datasets.

For relational risk, we used the firm’s shareholders, chief executive officer (CEO), and/or directors, supervisors, and senior management (DSS) as resources to form a bipartite graph and then construct interfirm network. We then followed three steps to collect information to quantify relational risk. For the manufacturing SMEs dataset,

  1. 1.

    At Qichacha, we did a reverse lookup on the 2136 SMEs, to find their CEO, shareholders, and DSS;

  2. 2.

    Then, at Qichacha, we crawled each person’s (i.e., CEO, shareholder, and DSS) web page and parsed the web page to retrieve information about each person’s employment at various firms. We finally included 9672 neighbor firms other than 2136 SMEs.Footnote 3

  3. 3.

    We examined risk events of the 9672 neighbor firms from Qichacha.

Ultimately, we obtained a network of firms consisting of 2136 target SMEs and 9672 other neighbor firms; hence, 11, 808 firms are in the interfirm network. Similarly, for the NEEQ SMEs dataset, we obtained a network of firms comprising 7943 target SMEs and 119,131 other neighbor firms (or a total of 127,074 firms in the interfirm network). Table 9 in Appendix 1 and Table 14 in Appendix 2 summarize the descriptive statistics of interfirm networks on the two datasets. Notably, in our empirical evaluation, for practicality (i.e., default prediction), we used cross-sectional data (i.e., the data at the time point of loan application) to quantify relational risk. Hence, changes of relationships among firms or macroeconomic factors may not influence the stability of the effect of the relational risk.

Relational risk

Our empirical evaluation used the shareholder network to represent the shareholding network and used the CEO network and the DSS network to represent the governance network. Subsequently, we used loan disputes and severe administrative penalties to identify risk events of all firms, both the 2136 target SMEs and the 9672 companies found via Qichacha. For administrative penalty, we considered severe administrative penalties that may have obvious effects on firm reputation and production, which we defined as fines over 10,000 RMB, forfeiture of illegal proceeds property, suspending of a production business, or temporary withholding or suspension of a license, according to the Law of the People’s Republic of China on Administrative Punishments. For loan disputes, we focused on the court judgments of loan disputes owing to the availability of data. Considering that the impact of a loan dispute will be affected by lawsuit status and the result of the judgment, we considered only loan disputes wherein the firm is on the losing end of the lawsuit and must pay compensation or is designated as a debtor in a losing case stipulating that the firm compensates all associated debts. We did not consider loan disputes that were later withdrawn. Finally, we used the smoothed version of the wvRN algorithm based on the network structure and risk events to produce a score to quantify relational risk, reflecting how a firm is affected by its neighbor firms’ risk events. Particularly, two hyperparameters (i.e., the weight function and the time of the risk event) in quantifying relational risk exist, and “Selection of hyperparameters” section provides details about the selection of the two hyperparameters.

Model building and evaluation

To examine three research questions, we implemented three sets of tests. Figure 4 presents a description of three sets of tests provided below, along with an example.

Fig. 4
figure 4

Tests design for examining research questions

The first test examines whether relational risk quantified based on heterogeneous information can improve the performance of credit risk evaluation of SMEs. Specifically, we quantified relational risk based on a mixed network, wherein all the relationship types (CEO, DSS, and shareholder) can link two firms, and both types of risk event can trigger neighbor firms with risk. The target firm is linked to all neighbor firms through shared persons, and each neighbor firm is provided with a probability score of 1 or 0 depending on whether it was involved in risk events (AP, LD, or AP + LD).

The second test examines whether network type affects the effectiveness of relational risk in credit risk evaluation of SMEs. Specifically, the CEO-specific relationship forms the CEO network (CN), wherein the target firm links to a neighbor firm if they share a CEO. Similarly, the DSS relationship forms the DSS network (DN), and the shareholder relationship forms the shareholder network (SN). We considered risk events equally, and each firm is provided with a probability score of 1 or 0 depending on whether it has been involved in risk events (AP, LD, or AP + LD).

The third test examines whether network and risk event type simultaneously affect the effectiveness of relational risk in credit risk evaluation of SMEs. Specifically, we constructed six networks, according to different combinations of relationships and risk events. To combine the CEO network and administrative penalties (CNAP), the target firm is linked to a neighbor firm if they share a CEO, and each firm is given a probability score of 1 or 0 depending on whether it has been involved in an administrative penalty (AP). The remaining five networks are the CEO network and loan disputes (CNLD), DSS network and administrative penalties (DNAP), DSS network and loan disputes (DNLD), shareholder network and administrative penalties (SNAP), and shareholder network and loan disputes (SNLD), and are defined analogously.

For each test, we compared two types of feature sets (basic features, and combination of basic features and the relational risk) and tested their performance to determine whether and to what extent relational risk affects the performance of credit risk evaluation methods. Specifically, we selected three models for credit risk evaluation, namely, LR, RF, and XGB. LR is a typical linear model and commonly used for credit risk evaluation owing to its strong interpretability (Figini et al. 2017). RF and XGB, as representative models of ensemble learning, are good at handling nonlinear relationships among data. Specifically, RF is an ensemble of multiple decision trees based on bagging and is usually used to balance errors for unbalanced data sets (Veganzones and Séverin 2018). XGB is a powerful and effective implementation of the gradient boosting ensemble algorithm and widely used owing to its high precision and efficiency (Sigrist and Hirnschall 2019). Table 18 in Appendix 3 presents meta-parameters for the three classifiers.

Credit risk evaluation methods incorporating different features were compared based on discrimination and granting performances. Discrimination performance refers to the ability to distinguish bad loans from good loans. The area under the receiver operating characteristic curve (AUC), Kolmogorov–Smirnov (KS) statistic, and H measure are typical measures for measuring discrimination performance (Kotsiantis et al. 2006). Moreover, three measurements can measure the discrimination performance in learning from imbalanced data, which solves the problem wherein two classes in loan default prediction are largely unbalanced (Hand 2009). The higher the AUC, KS, and H measure, the better the model’s discrimination performance. Granting performance refers to the proportion of defaulters under different granting ratios of loans (Wang et al. 2020a, b). Under a given granting ratio, lower default rate means less losses and more benefit for financial institutions. Notably, the economic benefit of additional features in credit risk evaluation could also be evaluated by other measures (Maldonado et al. 2017; Garrido et al. 2018).

Selection of hyperparameters

As discussed earlier, we need to optimize two hyperparameters (i.e., weight function and the time of the risk event). Table 1 presents five types of weight function. Considering that loan disputes and administrative penalties cannot be predicted and should be disclosed in a timely manner, we collected the information of the mentioned risk events in the short term to capture the temporal influence of risk events. Examining whether there is a temporal effect of risk events when dividing different intervals by year, while the rate of risk events is too small if dividing different intervals by months. Therefore, we divided the time of risk events by quarters and used 17 different intervals (i.e., within T months with T = {3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48} and more than 48 months). Notably, T is the duration from the time that the risk event occurred to the date of the loan application. Considering that the date of a loan application is usually 1 year before the performance point (i.e., the date of loan default), in our sample, the data did not include a firm’s neighbor firms’ risk event information occurring near the default date of that firm.

To estimate the discrimination performance of each model while two hyperparameters are optimized, we employed nested cross-validation (NCV) (Varma and Simon 2006). For the NCV, an inner cross-validation splitter is embedded within an outer cross-validation splitter. The inner cross-validation splitter is used to select hyperparameters. Outer cross-validation splitter averages the test error over multiple training–testing splits with the chosen hyperparameter. In our study, considering the manufacturing SMEs loan dataset as an example, the NCV is applied (Fig. 5). There are five steps as follows:

  1. 1.

    We transformed the interfirm network information using the wvRN into the relational risk score for each target SME. Basic features, the relational score, and the label of the credit loan of the target 2136 SMEs constitute the original dataset.

  2. 2.

    We select a tenfold cross-validation as the outer cross-validation splitter for the original dataset. In credit risk evaluation, the tenfold cross-validation technique is commonly used to spilt data and test model effectiveness (e.g., Van Vlasselaer et al. 2016; Yin et al. 2020). During the outer tenfold cross-validation, the original dataset is randomly split into ten equally sized datasets, called folds. Of the ten folds, a single fold is taken as the testing data. Conversely, the remaining nine folds are considered training data.

  3. 3.

    We selected a tenfold cross-validation as the inner cross-validation splitter on the training dataset. Similarly, during the inner tenfold cross-validation, the training dataset is randomly split into ten equal folds. Of the ten folds, a single fold is considered as the validation dataset, while the remaining nine folds are taken as training dataset. One fits a model on the training dataset with different hyperparameter subsets (i.e., different combinations of weight functions and the time of risk events) and evaluates it on the validation data. In our empirical study, 5 types of weight function and 17 types of time interval of risk events exist, resulting in 85 (17 × 5) hyperparameter subsets. To obtain a robust result of discrimination performance, we repeated the inner tenfold cross-validation procedure ten times, with a different split of folds each time, resulting in 100 performance estimates. Hence, the discrimination performance of each model with different hyperparameter subsets is determined by the average of 100 performance estimates.

  4. 4.

    We ranked the discrimination performance of each model with different hyperparameter subsets and selected the hyperparameter subset with the best performance as the optimal hyperparameter subset.

  5. 5.

    We retrained the model on the training dataset in the outer cross-validation with the optimal hyperparameter subset and evaluated this on the testing data. Similarly, we repeated the outer tenfold cross-validation procedure ten times, resulting in 100 performance estimates. Discrimination performance (“Results” section), is estimated as the mean of 100 estimates and 95% confidence interval of the outer tenfold cross-validation.

Fig. 5
figure 5

Flowchart of the nested cross-validation. An inner cross-validation splitter chooses hyperparameters first, followed by an outer splitter that averages the test error over multiple training–testing splits

Tables 10, 11, and 12 in Appendix 1 and Tables 15, 16, and 17 in Appendix 2 present summary statistics of risk events during different time intervals in different interfirm networks on two datasets. Tables 19 and 20 in Appendix 4 present the result of the optimal hyperparameter subset in different tests.

Results

We first evaluated the utility of the relational risk in credit risk evaluation of SMEs by implementing three sets of tests outlined in “Model building and evaluation” section on the manufacturing SMEs dataset. We then compared the granting performance of the baseline method with that of our own. Subsequently, to validate the utility of relational risk based on heterogeneous information in credit risk evaluation of SMEs, we selected another method (i.e., personalized PageRank algorithm) to quantify the relational risk and test its predictive power. Finally, we used the NEEQ SMEs dataset to further evaluate the generalizability of our proposed framework.

Relational risk in a mixed network

In the first evaluation, we examined the following:

Does relational risk quantified based on heterogeneous information improve the performance of credit risk evaluation of SMEs?

We assessed the discrimination performance of relational risk in a mixed network, wherein all relationship types (CEO, DSS, and shareholder) can link two firms, and both the risk event types (loan disputes and administrative penalties) can trigger neighbor firms with risk.

Table 2 summarizes the discrimination performance (by AUC, KS, and H measures) of the three classification models (LR, RF, and XGB) using two types of feature sets: only basic features (BF) and combination of basic features and the relational risk score (BF + RS). Baseline model uses only BF. When comparing two types of feature sets, the combination of basic features and relational risk score always performed better than the baseline model that uses only basic features in every performance measure. This suggests that relational risk is not a noisy feature. Furthermore, we tested the statistical significance of the effect of two types of feature sets (BF vs. BF + RS) on discrimination performance using a nonparametric test (Table 21, Appendix 5). Overall, differences between two types of feature set were statistically significant (χ2 = 97.018, p < 0.001, after the Bonferroni correction).

Table 2 Discrimination performance of different feature sets (BF and BF + RS)

We further reported results of the coefficients of basic features and the relational risk score based on the mixed network in the LR model and importance (e.g., mean decrease in Gini) of features in the RF model (as shown in Table 32 and Fig. 7 in Appendix 6). The coefficient of the relational risk score in the LR model reflects a significant positive correlation between the credit risk and the relational risk score (p < 0.05). Mean decrease in Gini of the relational risk score in the RF model demonstrates that relational risk is important in the credit risk evaluation of SMEs.

In summary, results show that the relational risk based on heterogeneous information indeed has discrimination ability in the credit risk evaluation of SMEs. Specifically, the relational risk score can significantly improve discrimination performance of the default prediction model when combined with basic features, again relative to the basic features alone.

Relational risk in relationship-specific networks

In the second evaluation, we examined the following:

Does network type affect the effectiveness of relational risk in credit risk evaluation of SMEs?

Compared with the mixed network, a fine-grained network, constructed by specific relationship type, involves a higher degree of specificity (firms with a specific relationship are linked together and may reflect relational risk more accurately). However, these networks inevitably have a lower degree of sensitivity (i.e., relatively fewer related firms are identified in each relationship-specific network). To investigate the impact of relationship type on the performance of relational risk score, we constructed three relationship-specific networks—CEO-specific network, DSS-specific network, and SN from the shareholder-specific network)—and extracted three relational risk scores, CN, DN, and SN, respectively.

Table 3 summarizes the discrimination performance of relational risk scores extracted from relationship-specific networks. Combining relational risk score (CN, DN, or SN) with basic features significantly improves the discrimination performance of default prediction methods. However, the enhancing effects varied across relational risk scores, depending on the type of relationship-specific network they were extracted from. As mentioned, the results may echo those in “Relational risk in a mixed network” section, if the propensity of risk events is higher in our data. Across the three relational risk scores (CN, DN, and SN), relational risk score based on the DSS-specific network (DN) led to the greatest improvement.

Table 3 Discrimination performance of relational risk in relationship-specific networks

We also tested the statistical significance of the effect of the four types of feature set (i.e., BF, BF + CN, BF + DN, and BF + SN) on discrimination performance using a nonparametric test (Table 22, Appendix 5). Overall, differences among the four types of feature sets were statistically significant (χ2 = 174.014, p < 0.001, after Bonferroni correction). Further pairwise comparisons demonstrate that the performance of BF + DN is significantly better than that of BF and BF + CN.

Overall, these results demonstrate that the DSS-based governance network outperformed all alternatives, including the CEO-based governance and shareholder networks. Firms with the same DSS have a more similar level of management. Therefore, the more similar their profitability, the more similar their risk will be. This result provides strong evidence that the DSS relationship is the most effective type of relationship in constructing a network for capturing relational risk.

Relational risk in relationship-specific and risk event-specific networks

In the third evaluation, we examined the following:

Does network type and risk event type simultaneously affect the effectiveness of relational risk in credit risk evaluation of SMEs?

To further investigate the impacts of relationship type and risk event type on the performance of relational risk, we used a full-factorial design. We constructed six relationship-specific and risk event-specific networks and correspondingly calculated six relational risk scores (as shown in Table 4).

Table 4 Relationship- and risk event-specific networks and corresponding relational risk scores

Table 5 summarizes the discrimination performance of the six relational risk scores. Risk event type of loan dispute demonstrates a robust performance across performance measures and models. Specifically, using the CEO relationship to link firms, CNLD outperformed CNAP in all performance measures (i.e., BF + CNLD vs. BF + CNAP). Using DSS to link firms, DNLD outperformed DNAP in terms of all performance measures (i.e., BF + DNLD vs. BF + DNAP). Using shareholders to link firms, SNLD outperformed SNAP in terms of all performance measures (i.e., BF + SNLD vs. BF + SNAP). Additionally, using DSS to link firms and loan disputes to identify related firms with risk (DNLD) outperformed all alternatives in terms of all performance measures.

Table 5 Discrimination performance of relational risk in relationship-specific and risk event–specific networks

We tested the statistical significance of the effect of the seven types of feature set (i.e., BF, BF + CNAP, BF + CNLD, BF + DNAP, BF + DNLD, BF + SNAP, and BF + SNLD) on discrimination performance (Table 23, Appendix 5). Overall, differences among the seven types of feature set were statistically significant (χ2 = 433.640, p < 0.001, after Bonferroni correction). Further pairwise comparisons demonstrate that the performance of BF + DNLD is significantly better than that of the others. Results suggest that the DSS relationship and loan disputes as risk event are most informative in determining a firm’s credit risk.

We further tested the statistical significance of the effect of the three types of feature set (i.e., BF + RS, BF + DN, and BF + DNLD) on discrimination performance using a nonparametric test (Table 24, Appendix 5). Overall, differences among the three types of feature set were statistically significant (χ2 = 94.957, p < 0.001, after Bonferroni correction). Further pairwise comparisons demonstrate that the performance of BF + DNLD is significantly better than that of BF + DN and BF + RS (p < 0.001, after Bonferroni correction).

Overall, these results demonstrate that network and risk event types can simultaneously affect the effectiveness of relational risk in credit risk evaluation of SMEs. Specifically, relational risk score in the combination of DSS network and loan dispute achieved the best incremental effect on the credit risk evaluation of SMEs. The results provide a noteworthy rationale for financial institutions to construct a fine-grained network with a specific relationship (i.e., DSS) and specific risk event (i.e., loan dispute) to maximize the discrimination performance of relational risk in the credit risk evaluation of SMEs.

Granting performance

To evaluate whether adding the relational risk could bring profits to financial institutions, we compared the granting performance of two models: (1) baseline model only using basic features and (2) model using both basic features and the relational risk score DNLD. First, we used a prediction model to estimate the default probability of a loan through a tenfold cross validation and then ranked the default probability from smallest to largest. Second, according to rank, we selected different cut-off values of the granting proportion of loans (i.e., percentage of loan applications approved) and calculated the default rate (i.e., proportion of defaulters) under different granting proportions.

Figure 6 visualizes the granting performance of the two models. We find that for the three classification methods, modeling using both basic features and relational risk score DNLD to scan loan applications produced a lower default rate than using only basic features. The result shows that relational risk could bring economic profits to banks.

Fig. 6
figure 6

Granting performance of two models. The horizontal axis shows various loan granting ratios ranging from 30 to 97%. The vertical axis shows the default rate, which ranges from 0 to 6.4%, with the maximum default rate equaling the overall default rate

Results of the personalized PageRank

To sufficiently validate that using publicly available risk events of each firm in an interfirm network could be used to quantifying relational risk, in addition to the smoothed version of wvRN, we quantified relational risk using another method (i.e., Personalized PageRank algorithm) (Van Vlasselaer et al. 2016). First, we calculated the relational risk score using the Personalized PageRank algorithm, wherein the setting is identical in quantifying relational risk using wvRN in the mixed network (i.e., considering both the loan disputes and administrative penalties within 18 months in the mixed network). We then compared the discrimination performance of three types of feature set: (1) basic features (BF); (2) a combination of basic features and relational risk score using the wvRN (BF + RS); and (3) a combination of basic features and relational risk score using the Personalized PageRank algorithm (BF + prRS). Both relational risk scores RS and prRS are useful for credit risk evaluation (Table 6). We further tested the statistical significance of the effect of the three types of feature set on discrimination performance using a nonparametric test (Table 25, Appendix 5). Overall, differences among three types of feature set were statistically significant (χ2 = 104.004, p < 0.001, after Bonferroni correction). Further pairwise comparisons demonstrate that the relational risk using both the relational risk scores RS and prRS significantly improves the discrimination performance of baseline models. Conversely, the incremental effect of the relational risk score RS is not significantly better than prRS (p = 0.231, after Bonferroni correction). The result shows that the relational risk quantified based on publicly available risk events of each firm in an interfirm network can indeed improve discrimination performance of credit risk evaluation. Moreover, quantification methods contain no restrictions.

Table 6 Discrimination performance of different feature sets

Robustness tests

We conducted three robustness tests to further ensure that our results are not affected by (1) dataset selection; (2) data coverage; and (3) exclusion biases. In this subsection, we provided a summary discussion of the main tests and findings; the main test results are shown in Appendixes.

The first test examines whether our results are robust to potential dataset selection bias. We only evaluated the utility of our framework using samples on the manufacturing SMEs dataset. Hence, results and findings from samples on different datasets may differ. We then tested the robustness of results on a new dataset (i.e., NEEQ SMEs dataset) with the three sets of tests introduced in “Model building and evaluation” section. Tables 33, 34 and 35 and Fig. 8 in Appendix 8 present the results. Overall, in the NEEQ SMEs dataset, main results are similar to those in the manufacturing SMEs dataset. However, the performance of the LRs is lower. In our perspective, the main reason for the difference can be attributed to differences in sample size. With the increase of samples, the nonlinear relationship between features becomes more obvious; however, LR cannot effectively capture nonlinear relationships among features.

The second test examines whether our results are robust to potential data coverage bias. In the first robustness analysis, we only consider non-listed SMEs in the NEEQC SMEs dataset, while significant differences may be found on results between listed and unlisted SMEs. Hence, to alleviate data coverage bias, we added a dummy variable—whether an SME is listed or non-listed—to the basic features of NEEQC SMEs to test the robustness of our results. Tables 36 and 37 in Appendix 8 indicate that our results are consistent with the first robustness analysis.

The third test examines whether our results are robust to potential exclusion bias. Loan disputes that were later withdrawn were not considered, as discussed in “Relational risk” section. To relieve exclusion bias, we included loan disputes that were later withdrawn when quantifying relational risk. Tables 38 and 39 in Appendix 8 present the results. Overall, our results are consistent with those in “Relational risk in a mixed network”–“Relational risk in relationship-specific and risk event-specific networks” sections, except in the LR model, wherein the relational risk score improves relatively poorly.

Conclusion

Previous studies on the credit risk evaluation of SMEs have mostly considered the intrinsic risk arising from SMEs, while only few studies have considered both the intrinsic and relational risks arising from SMEs’ neighbor firms (e.g., Vinciotti et al. 2019; Kou et al. 2021). Furthermore, existing studies typically quantify relational risk based on the credit history information of each firm in an interfirm network (e.g., Calabrese et al. 2017; Vinciotti et al. 2019). However, owing to access and cost issues, these studies have limited applicability. Hence, our study presents a novel methodology to quantify relational risk (i.e., quantifying relational risk based on publicly available risk events of each firm in an interfirm network) and proposes a framework to incorporate relational risk into credit risk evaluation of SMEs. Subsequently, we evaluate our framework relative to two Chinese SMEs datasets. Our empirical results indicate that incorporating our proposed relational risk statistically improves both discrimination and granting performances of credit risk evaluation of SMEs. Moreover, network type, risk event type, and time dependence of risk events influence the effectiveness of relational risk in credit risk evaluation of SMEs. These results can provide vital managerial implications for financial institutions in terms of quantifying relational risk and incorporating relational risk in credit risk analysis framework.

Our study provides the following contributions to research and practice. Considering the research perspective, we propose a framework for quantifying relational risk based on heterogeneous information for credit risk evaluation of SMEs. This framework can be used in other financial scenarios (e.g., bankruptcy prediction) to exploit relational risk hidden in interfirm networks. Moreover, our empirical evaluation suggests that relational risk quantified based on heterogeneous information can be used to evaluate credit risk for SMEs, broadening the existing literature on quantifying relational risk that quantifies relational risk based on homogeneous information (e.g., Calabrese et al. 2019; Óskarsdóttir and Bravo 2021).

From the practical perspective, our study may help financial institutions better evaluate SMEs’ credit risk. Specifically, our study provides a noteworthy rationale for financial institutions to incorporate relational risk based on heterogeneous information when granting credit to SMEs—especially in markets where the financial information of SMEs is insufficient and opaque. Additionally, we provide a potential new tool to enhance the current credit risk assessment process by incorporating relational risk score and, hence, direct more capital at a lower cost to the least-risky SMEs. This promotes efficient allocation of resources, economic growth, and employment. Moreover, considering information collection and storage costs, we provide valuable guidance for financial institutions on how to collect information for studying relational risk, including quantifying relational risk using publicly available risk events, constructing a fine-grained network with a specific relationship (i.e., DSS) and specific risk event (i.e., loan dispute), and considering time dependence of risk events.

Our study contains the following limitations, which can serve as future research directions. First, the sample size for our empirical analysis is relatively small, which prevents us from using deep learning models for studying relational risk, including graph convolutional neural and graph neural networks (Lee et al. 2021). Second, owing to data privacy protection, we cannot obtain the credit history information of each firm in an interfirm network. Accordingly, we cannot compare the effectiveness of relational risk based on homogeneous information with that based on heterogeneous information. Third, as samples from different times are not available, we cannot use out-of-time validation to test model robustness. Fourth, we only evaluated our proposed framework on Chinese samples. In the future, we can conduct a more comprehensive evaluation involving samples from other countries and a cross-country study to test the external validity of our proposed framework. Additional data can address these shortcomings, warranting further study.

Availability of data and materials

The data that support the findings of this study are available from a local commercial bank but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the local commercial bank.

Notes

  1. In the U.S., everyone can purchase a credit report of a firm via three business credit bureaus—Equifax, Experian, and Dun &Bradstreet (Berger and Frame, 2007). In Germany, financial institutions can purchase a credit report of a firm via a credit bureau Creditreform (Dierkes et al. 2013).

  2. Qichacha (http://www.qichacha.com) displays publicly available information about firms (e.g., board structure, managerial information, administrative penalties, and loan disputes). Qichacha obtains all collected information from the state administration for industry and commerce of China using API technology, and thus can be considered authoritative and trustworthy.

  3. The directors (or shareholders or the CEO) of each SME could also serve on boards of non-SMEs. In our empirical study, we incorporate SMEs and non-SMEs to construct an interfirm network and treat them equally. However, we use only the target 2,136 SMEs for credit risk evaluation.

Abbreviations

SMEs:

Small and medium-sized enterprises

wvRN:

Weighted-vote relational neighbor

LDA:

Linear discriminant analysis

LR:

Logistic regression

MDA:

Multivariate discriminant analysis

RF:

Random forest

XGB:

EXtreme gradient boosting

ROE:

Return on equity

ROA:

Return on assets

CEO:

Chief Executive Officer

DSS:

Directors, Supervisors, and Senior Management

CN:

CEO network

DN:

DSS network

SN:

Shareholder network

AP:

Administrative penalty

LD:

Loan disputes

CNAP:

CEO network and administrative penalties

CNLD:

CEO network and loan disputes

DNAP:

DSS network and administrative penalties

DNLD:

DSS network and loan disputes

SNAP:

Shareholder network and administrative penalties

SNLD:

Shareholder network and loan disputes

NCV:

Nested cross-validation

References

Download references

Acknowledgements

We would like to thank the National Natural Science Foundation of China, which financially supported this research (Grant Nos. 71731005 and Nos.72101073).

Funding

This work was funded by the National Natural Science Foundation of China (Grant Nos. 71731005 and Nos.72101073).

Author information

Authors and Affiliations

Authors

Contributions

JL contributes to the acquisition and analysis of data, and drafted the work. CJ contributes to the conception of the work and the interpretation of data. SD contributes to the design of the work and drafted the work. ZW contributes to the design of the work and the creation of new software used in the work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Cuiqing Jiang.

Ethics declarations

Competing interests

The authors declared that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Summary statistics of manufacturing SMEs dataset

The definition of SMEs in China is based on the industry, operating income, and number of employees. A manufacturing SME in China is defined as having an operating income of less than 400 million RMB and employees who number fewer than 1000. In our empirical study, the average operating income of the target 2136 manufacturing SMEs is 22,943,305 RMB, the maximum operating income is 399,280,700 RMB, and the minimum operating income is 2,006,597 RMB. The distribution of the number of employees of the target 2136 manufacturing SMEs is as shown in Table 7. We know from the frequency distribution histogram that there are no firms with more than 1000 employees. All firms in our dataset, are qualified as SMEs (the operating income range from 2 to 400 million RMB, the number of employees range from 20 to 1000), except one firm whose operating income was 1,006,597 RMB at the loan application time (Table 8).

Table 7 shows the descriptive statistics of basic features. Table 9 show the descriptive statistics of different interfirm networks on manufacturing SMEs dataset.

Table 10 shows the number and the rate of firms involved in risk events (both loan disputes and administrative penalties) over different intervals in different networks, including the mixed network (MN), the DSS network (DN), the shareholder network (SN), and the CEO network (CN). Table 11 shows the number and the rate of firms involved in administrative penalties over different intervals in different networks. Table 12 shows the number and the rate of firms involved in loan disputes over different intervals in different networks.

Table 7 Descriptive statistics of the number of employees
Table 8 Descriptive statistics of basic features on manufacturing SMEs dataset
Table 9 Descriptive statistics of different interfirm networks
Table 10 Descriptive statistics of LD + AP in different networks
Table 11 Summary statistics of AP in different networks
Table 12 Summary statistics of LD in different networks

Appendix 2: Summary statistics of NEEQ SMEs dataset

See Tables 13, 14, 15, 16 and 17.

Table 13 Descriptive statistics of basic features on NEEQ SMEs dataset
Table 14 Descriptive statistics of different interfirm networks on NEEQ SMEs dataset
Table 15 Descriptive statistics of LD + AP in different networks on NEEQ SMEs dataset
Table 16 Summary statistics of AP in different networks on NEEQ SMEs dataset
Table 17 Summary statistics of LD in different networks on NEEQ SMEs dataset

Appendix 3: Meta-parameters for each classifier

See Table 18.

Table 18 Meta-parameters for each classifier

Appendix 4: Selection of the optimal hyperparameter subset in different methods

We used the NCV to train models while selecting the optimal hyperparameter subset. In our empirical study, there are 5 kinds of weight function and 17 kinds of time interval of risk events, resulting in 85 (17 × 5) hyperparameter subsets. To get a robust result of discrimination performance, we repeated the inner ten-fold cross-validation procedure ten times, with a different split of folds each time, resulting in 100 performance estimates. As such, the discrimination performance of each model with different hyperparameter subsets is determined by the average of 100 performance estimates. Then, for each model, we ranked their discrimination performance (in terms of AUC) with different hyperparameter subsets, and selected the hyperparameter subset with the best discrimination performance as the optimal hyperparameter subset, as shown in Tables 19 and 20.

Table 19 Optimal hyperparameters of different methods on manufacturing SMEs dataset
Table 20 Optimal hyperparameters of different methods on NEEQ SMEs dataset

Appendix 5: Results of non-parametric tests on manufacturing SMEs dataset

We used a non-parametric test to test the statistical significance of the effect of different types of feature set on discrimination performance. Since the Friedman test is rank sum test, the results across classification models (i.e., LR, RF, and XGB) and performance measures (i.e., AUC, KS, and H measure) were put together (i.e., sample size is 100 × 3 × 3 = 900). Tables 21, 22, 23, 24 and 25 show the difference across different feature sets.

Table 21 Results of pairwise comparison (BF vs. BF + RS)
Table 22 Results of full pairwise comparison (BF, BF + CN, BF + DN, and BF + SN)
Table 23 Results of full pairwise comparison (BF, BF + CNAP, BF + CNLD, BF + DNAP, BF + DNLD, BF + SNAP, and BF + SNLD)
Table 24 Results of full pairwise comparison (BF + RS, BF + DN, and BF + DNLD)
Table 25 Results of full pairwise comparison (BF, BF + RS, and BF + prRS)

Appendix 6: Different confidence intervals and estimate of each mean of each metric

Tables 26 and 27 show the 95% confidence intervals of each mean of each metric. Tables 28 and 29 show the 90% confidence intervals of each mean of each metric. The metrics in Tables 30 and 31 are estimated based on the mean of 50 estimates, which is derived by repeating the outer ten-fold cross-validation procedure five times.

Table 26 Discrimination performance of different feature sets
Table 27 Discrimination performance of different feature sets under 95% confidence intervals
Table 28 Discrimination performance of different feature sets under 90% confidence intervals
Table 29 Discrimination performance of different feature sets under 90% confidence intervals
Table 30 Discrimination performance of different feature sets
Table 31 Discrimination performance of different feature sets

Appendix 7: Analysis of results in LR and RF

Table 32 shows the coefficients of features in LR. Figure 7 shows the Mean Decrease in Gini of the relational risk score in RF. The meta-parameters for each classifier in our empirical study are shown in Table 33.

Table 32 Coefficients of features in the logistic regression model
Fig. 7
figure 7

Random forest feature importance. A greater mean decrease in Gini indicates a more important feature

Appendix 8: Robustness analyses

We conducted three sets of robustness tests to ensure that our results are not affected by (1) dataset selection bias; (2) data coverage bias; and (3) exclusion bias. Below are main results of robustness tests. Interested readers can contact us for more detailed descriptions of test procedures and results.

For the first robust test, Tables 33 and 34 summarize the discrimination performance (in terms of AUC, KS, and H measure) of the three classification models (LR, RF, and XGB) using 11 types of feature sets: (1) BF, (2) BF + RS, (3) BF + CN, (4) BF + DN, (5) BF + SN, (6) BF + CNAP, (7) BF + CNLD, (8) BF + DNAP, (9) BF + DNLD, (10) BF + SNAP, and (11) BF + SNLD. Table 35 shows the result of full pairwise comparisons between different methods on NEEQ SMEs dataset. Figure 8 shows the granting performance of the three classification models using two types of feature sets: (1) BF; (2) BF + DNLD.

Table 33 Discrimination performance of different feature sets on NEEQ SMEs dataset
Table 34 Discrimination performance of different feature sets on NEEQ SMEs dataset
Table 35 Result of full pairwise comparisons between different methods on NEEQ SMEs dataset
Fig. 8
figure 8

Granting performance of two models on NEEQ SMEs Dataset. The horizontal axis represents different granting proportions of loans ranging from 30 to 95%. The vertical axis is the default rate ranging from 0 to 5.1%

For the second robust test, Tables 36 and 37 summarize the discrimination performance (in terms of AUC, KS, and H measure) of the three classification models (LR, RF, and XGB) using 11 types of feature sets: (1) BF, (2) BF + RS, (3) BF + CN, (4) BF + DN, (5) BF + SN, (6) BF + CNAP, (7) BF + CNLD, (8) BF + DNAP, (9) BF + DNLD, (10) BF + SNAP, and (11) BF + SNLD.

Table 36 Discrimination performance of different feature sets
Table 37 Discrimination performance of different feature sets

For the third robust test, Tables 38 and 39 summarize the discrimination performance (in terms of AUC, KS, and H measure) of the three classification models (LR, RF, and XGB) using 11 types of feature sets: (1) BF, (2) BF + RS, (3) BF + CN, (4) BF + DN, (5) BF + SN, (6) BF + CNAP, (7) BF + CNLD, (8) BF + DNAP, (9) BF + DNLD, (10) BF + SNAP, and (11) BF + SNLD.

Table 38 Discrimination performance of different feature sets
Table 39 Discrimination performance of different feature sets

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Long, J., Jiang, C., Dimitrov, S. et al. Clues from networks: quantifying relational risk for credit risk evaluation of SMEs. Financ Innov 8, 91 (2022). https://doi.org/10.1186/s40854-022-00390-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40854-022-00390-1

Keywords