How signaling and search costs affect information asymmetry in P2P lending: the economics of big data

In the past decade, online Peer-to-Peer (P2P) lending platforms have transformed the lending industry, which has been historically dominated by commercial banks. Information technology breakthroughs such as big data-based financial technologies (Fintech) have been identified as important disruptive driving forces for this paradigm shift. In this paper, we take an information economics perspective to investigate how big data affects the transformation of the lending industry. By identifying how signaling and search costs are reduced by big data analytics for credit risk management of P2P lending, we discuss how information asymmetry is reduced in the big data era. Rooted in the lending business, we propose a theory on the economics of big data and outline a number of research opportunities and challenging issues.


Introduction
The ability to access credit is a critical lifeline for Small-and Medium-sized Enterprises (SMEs), which represent a large proportion of the total economy around the world. But financing SMEs has been a challenging task for the traditional banking system, as these firms are more informationally opaque, risky, financially constrained, and bankdependent than large firms (Wehinger 2014). According to the OECD survey of SMEs' access to finance in 34 countries from 2007 to 2013 (OECD 2015), SMEs tend to be much more vulnerable to economic and financial crises than larger corporations, and their access to credit tends to dry up much more rapidly. Key challenges that banks face when lending to SMEs include inadequate financial information and recordkeeping, deficient business management knowledge, and banks' lack of confidence (Mawocha et al. 2015).
Online Peer-to-Peer (P2P) lending, which has experienced significant growth in the last decade, has become an important new type of financial service for SMEs. Instead of a bank intermediating between lenders and borrowers, the P2P lending platform directly connects borrowers to lenders. Since the world's first P2P lending platform, Zopa, was founded in 2005 in the United Kingdom, the P2P lending system has been quickly adopted in many other countries. P2P platforms in the United States provided approximately US$5.5 billion of lending in 2014 (PricewaterhouseCoopers 2015), and the total P2P loan volume in China in 2014 reached US$41.3 billion (Wang et al. 2015b). The lending business, which has historically been dominated by commercial banks, has gone through a so-called financial democracy paradigm shift (TheEconomist 2015). Renaud Laplanche, the CEO of Lending Club, the largest P2P company in the US, believes that while small business owners do not have adequate access to affordable and transparent credit via banks, P2P lending platforms like Lending Club can provide a predictable, flexible, low-cost way to access credit 'on demand' if and when they need it (Alois 2015). In Morgan Stanley's May 2015 report (Srethapramote et al. 2015), SMEs' P2P lenders represented $4.6 billion of 2014 issuance, or 2.1% of the total market. P2P lenders are estimated to reach $47 billion, or 16%, of total SMEs' lending in the US.
Information technology breakthroughs, i.e., big data-based financial technology (Fintech), have been identified as one important disruptive driving force of this paradigm shift in the lending industry (PricewaterhouseCoopers 2015;TheEconomist 2015;Karabell 2015). The big data era has witnessed to significant changes in methods to collect, present, and evaluate information . Search costs for credit information have been dramatically reduced, and the collection of credit data has transformed from passive information retrieval into proactive information gathering. In this paper, we take an information economics perspective to investigate how information signaling and search costs affect information asymmetry in the lending business and analyze how big data reduces information asymmetry in P2P lending. Rooted in the P2P lending business, we discuss the economic value of big data and outline a number of business opportunities and several challenging research issues.

Background
The economics of information in the lending business In the lending business, a debt contract is established for those who receive financing (borrowers) and those who provide it (lenders), and the borrower promises to repay the principal plus the required interest in a stipulated amount of time. However, beyond all the legal provisions, the contract is compromised when asymmetric information problems are taken into account (Bebczuk 2003). First, the intrinsic uncertainty surrounding any investment project puts the borrower's ability to repay in question; second, the borrower's fragile promise to obey the contract can be more difficult to surmount. Therefore, evaluating credit risk in loan businesses is essential for the lender to make loan decisions, and credit information is a crucial element in the evaluation of credit risk.
To overcome this information asymmetry problem, the borrower signals and conveys information about him/herself and the investment project's characteristics, while the lender searches credit information and screens the loan applicant. In the traditional banking system, the bank makes loan decisions and is in charge of collecting and evaluating credit information. As the risk of default is dependent, among other things, on the borrower's credit history and the characteristics of the investment project, the bank will probably wish to see the borrower's accounts, balance sheet, and business plan, as well as studying their credit history or FICO score 1 and requiring collateral for secured loans.
As shown in Fig. 1, the economics of information theory suggests that a lender would acquire information until to the point where the marginal cost of acquiring additional information equals the marginal benefit (Goldman et al. 1978;Stigler 1961). Similarly, the borrower would not convey complete information because of signaling cost (Milde et al. 1988;Sharpe 1990;Spence 1973). In an environment in which credit histories are not documented and pooled, the bank as a lender needs to expend significant time and resources on screening the loan applicant (Aleem 1990), while the borrower has limited means, such as collateral, to signal their trustworthiness (Stiglitz et al. 1992). This obviously causes information asymmetry in the traditional lending business, where the borrower has information that the bank ignores or does not have access to.
Adverse selection and moral hazard are two consequences of information asymmetry (Akerlof 1970), and studies have shown such phenomena in the consumer credit industry (Karlan et al. 2009). A lender suffers adverse selection when he is not capable of distinguishing among borrowers and investment projects with different credit risks when allocating credit. A moral hazard arises if the borrower applies the borrowed funds to different investment projects than those agreed upon with the lender, who is hindered by his lack of information and control over the borrower (Bebczuk 2003).

Risks in P2P lending
The emergence of online P2P lending has not only changed the lending business, but also brought new types of risks. As a new type of financial service, P2P lending is currently still in the early stage of its development without appropriate laws and regulations designed specifically for online lending. Coupled with a lack of clear industry standards, the quality of a P2P lending platform is hard to verify. For example, Wang et al. (2015b) identify nine types of risks in the online lending industry involved in different stages of a P2P transaction: insufficient credit checking, inadequate intermediation, untimely repayment, lack of liquidity, lack of transparency, operational and technical failure, legal risk, excessive leverage, and lack of ethics. Many of the risks, such as inadequate intermediation and lack of transparency, are caused by the unique features of the online P2P lending business model.
Nevertheless, credit risk is the most important risk to be controlled in online lending, and the credit check is at the core of the essential technologies of online lending platforms. Similar to traditional lending, controlling the loan amount is a key component of the credit check (Wang et al. 2015b). The loan amount is the maximum amount of funds that can be lent to a borrower after a comprehensive credit rating of the borrower, according to the type and value of the collateral as well as the risk profile of the asset financed by the loan. The key component of credit rating is to quantify the default risk based on the borrower's ability and willingness to repay through analyzing their  (Wang et al. 2015b). Therefore, the credit risk management of a P2P lending platform depends on its capability to collect and analyze borrowers' credit information.
Given different IT capabilities in collecting and evaluating a borrower's credit information, online lending platforms adopt different strategies in their business models. In an environment with an underdeveloped credit rating system, platforms would issue more secured loans with collateral. Many platforms also adopt offline investigation to assist with credit checks, rather than merely relying on online evaluation. For example, a borrower's assets, credit status, and other financial information are examined offline through communications with the borrower and people in the community. However, offline investigation is limited geographically and will increase the search cost for lenders. Using information technologies to complement or even replace offline investigation has become a trend in P2P lending.
Most prior academic studies on risk management in online P2P lending have focused on how to utilize online information for credit risk management. Three categories of determinants of default risk are proposed: loan characteristics, borrower characteristics, and borrower's group characteristics (Everett 2008). Since information asymmetry is the major barrier for the lender to reduce default risk, several studies focus on how to mitigate information asymmetry between borrowers and lenders in the lending process (Freedman et al. 2011). Potentially useful indicators of default risk include a borrower's credit scores (Iyer et al. 2009), demographic information (Kumar 2007), and social network (Lin et al. 2013;Liu et al. 2015). These studies highlight an increasing academic interest in discussing big data analytics in P2P lending. In the next section, we will discuss how big data analytics provides an opportunity for credit risk management in P2P lending.

Big data for credit risk management in P2P lending
Since recent technological revolutions enable us to generate data much faster than ever before, how to utilize the data to create new business models has attracted enormous attention (Aalst et al. 2015;McAfee et al. 2012;Zhao et al. 2014). The concept of big data, which is used to describe the exponential growth and availability of data, has been recognized by governments, companies, and academia. Data can be big in different dimensions, such as volume, velocity, and variety (Zikopoulos et al. 2011). Volume refers to the wide range of data generated from multiple sources; variety refers to the different kinds of data collected such as video, audio, image, and text from varied sources like sensors, social media, smart phones, etc.; velocity refers to the speed to collect and process data. Big data research focuses on the techniques, technologies, systems, practices, methodologies, and applications that convert big data to useful, relevant, timely information, helping an enterprise to better understand its business and market conditions and to make appropriate decisions (Chen et al. 2012;Fan et al. 2015).
Online P2P lending platforms (or other similar online micro-lending websites) require big data and are perhaps best positioned to use it efficiently. Since P2P lending transactions take place online, IT tools and online data can be effectively utilized to assist credit risk management. As shown in Fig. 2, the business process of credit evaluation in online P2P lending is simple. But it accesses far more data than a traditional bank in making a loan decision (Wang et al. 2015a).
Take Ali Finance of Alibaba as an example. Ali Finance can easily access potential borrowers' data and credit information. As shown in Fig. 3, big data collection includes borrowers' transaction details in Aibaba.com.cn, Tmall.com, Taobao.com, Alibaba.com, and Alipay.com, along with third-party certified information. Machine learning algorithms are used, and various weights are fine-tuned in the model in real time to make the analysis more accurate. The results are then used by the Ali Finance Platform to predict a potential borrower's credit risk, which is then suggested to potential lenders.

Big data reduces information asymmetry in P2P lending
In a two-sided market, information systems can serve as intermediaries between the buyers and the sellers, reducing the search costs buyers must pay to obtain information about the prices and product offerings available in the market, as well as the signaling costs that sellers must pay to inform the market about their products, prices, and quality (Bakos 1991;Bakos 1997). Online P2P lending platforms serve as such an agent between lenders and borrowers to reduce both the search and signaling costs. Big data  technologies, therefore, make it possible to significantly reduce information asymmetry in P2P lending.
The availability of big data, retrieved from various data sources, with different data formats and sizes, can provide a more complete picture of a borrower. P2P lending platforms are using a wide range of data to evaluate credit risk, while traditional banks may not have the technical ability or analytical skills to utilize these new forms of data. Going beyond the traditional, simplistic indicators of credit risk such as an applicant's assets, existing liabilities, and FICO score, P2P lending platforms analyze more dynamic data points from public websites, agencies, and public records. Relevant big data may include purchases using credit cards, accounting records from small business bureaus, length of time the borrower has used the same email address, the number of connections on Twitter, Facebook, or other social media sites, reviews and ratings from business directories such as Yelp, and local and government public records (Moldow 2015).
The fundamental change is that the philosophy of analyzing credit risk has transformed from passive information retrieval into proactive big data analytics. In other words, in traditional credit evaluation, lenders passively depend on the borrowers providing information about themselves; while in the big data era, lenders can proactively search the 360-degree online footprint of potential borrowers and let data tell who they really are. Big data tools allow lenders to tie all the information together and assess it from multiple perspectives to gain new insights. For instance, smartphone or computer owners generate data by anything they do with that device (social media, surfing, ecommerce purchases, financial transactions, etc.). Similarly, Kreditech, a German online P2P lender, factors in a potential borrower's behavioral data including the way the online application form is filled in, how often capital letters are used, or the speed at which the mouse is moved. Zopa, the UK's largest P2P lending service, tracks the applicants it has turned down for loans to see if they turn out to be good credit risks after they found another willing lender.
In fact, most of the leading P2P lending platforms use big data technologies to build more comprehensive and dependable credit profiles. Lending Club has developed proprietary credit score models and a unique algorithm called Model Rank to determine the final interest rate for each loan grade. Model Rank is based upon an internally  developed big data algorithm that analyzes the performance of borrower members and takes into account the FICO score, credit attributes, and other application data. Prosper, a P2P lending company in the US, has created the Prosper Score to determine the Prosper Rating. The Prosper Score is derived from a combination of all potential variables available at the time of listing, including those from the identification authorization process, the credit report, and listing details provided by the borrower. Key variables may include the authorization score, income, debt-to-income ratio, total revolving balance, and delinquencies, as well as other factors like the number of inquiries to the credit bureau, credit card utilization, number of recently opened trades at the credit bureau, and loan payment performance on prior Prosper loans. Yirendai, a Chinese P2P lending platform for CreditEase, relies on big data technology to give consumers quick decisions on loans. Consumers who apply for loans using the Yirendai platform must provide data on their credit cards, e-commerce transactions, and mobile phone carriers. Each person's data undergoes a risk control assessment on a financial cloud platform of CreditEase to make a quick and individual lending decision.

The economics of big data
The economic impacts of big data in the lending business lie in its ability to more accurately and reliably separate trustworthy borrowers from bad ones. With limited and passive information submitted by a potential borrower, which could be biased and lead to information asymmetry, traditional credit evaluation may incorrectly accept a loan application from a borrower with high credit risk (false positive), or reject a loan application from a borrower with low credit risk (false negative). Big data, on the other hand, may include information that is obtained by the lender via multiple channels of information, which creates a 360-degree risk profile of the borrower. The information that lenders proactively gather may be more objective and cannot be easily manipulated by the borrower, thus reducing information asymmetry. With big data technologies, successful P2P lending platforms can help those trustworthy borrowers who cannot access bank financing and exclude those default-likely borrowers who may be able to get bank financing. Further, by cutting banks out of the lending process, successful borrowers typically pay a lower interest rate than they would have paid on a traditional bank loan. And lenders can earn higher returns than they would have received by saving their money in bank accounts. The economics of big data is an extension of the economics of information, as big data represents a particular kind of information that extends our understanding on the value of information. As the economics of information suggests, the monetary value of information must be presented in such a way as to create an opportunity. Big data analytics make it possible for P2P lending platforms to better evaluate borrowers and make quick lending decisions. As such, big data is not free since much investment is needed to search and analyze it for risk assessment. Thus, it is important to consider the different qualities and costs in collecting, analyzing, and applying the data.
Proposition 1: The quality of data will impact information asymmetry.
Specifically, the quality of big data can be explored along the dimensions of volume, variety, velocity, and veracity. Volume represents how massive the data is, i.e., the number of transaction records. For example, for the same borrower, daily transaction records will be larger in volume than monthly transactions. We believe that a higher data volume allows us to know the borrower better. Therefore: Hypothesis 1.1: Higher volume of data will lower information asymmetry.
Variety refers to the many sources and types of data, both structured and unstructured. With regards to the same borrower, data including mobile phone and online shopping records have more variety than data with only mobile phone records. We believe: Hypothesis 1.2: More variety of data will reduce information asymmetry.
Velocity means the pace at which data flows in. Real-time, high-frequency information has more velocity than lagged, low-frequency information. We argue that: Hypothesis 1.3: More velocity of data will reduce information asymmetry.
Veracity refers to the biases, noise, and abnormality in data. Since much of the data gathered by lenders may contain noise and inaccurate information, we expect that veracity will impact information asymmetry.
Hypothesis 1.4: More veracity of data will reduce information asymmetry.
Further, ensuring the quality of analysis is critical to the success of applying big data in P2P lending. A poor business intelligence model may produce incorrect results that may not reduce information asymmetry.
Proposition 2: The quality of analysis will impact information asymmetry.
The quality of analysis can be characterized as the accuracy and precision of the model. Accuracy is used to describe the closeness of a measurement to the true value. The more accurate the model, the less the information asymmetry.
Hypothesis 2.1: More accurate analysis will reduce information asymmetry.
Precision describes a level of measurement that yields consistent results when repeated. The more precise the analysis is, the more reliable the results, reducing the information asymmetry caused by random errors.
Hypothesis 2.2: More precise analysis will reduce information asymmetry.
Figure 4 summarizes our proposed theory of the economics of big data. Big data analytics, based on business intelligence models with large amounts of data from multiple data sources, is applied in P2P lending to evaluate credit risk. The transaction records in P2P lending in turn provide more data for big data analytics. This application of big data analytics in P2P lending would greatly reduce signaling and search costs, thus reducing information asymmetry between borrowers and lenders. With reduced information asymmetry, moral hazard would be monitored and detected in a timely manner, and adverse selection would be better identified and avoided. Based on the discussion in this study, it is clear that introducing big data analytics to the lending business has caused tremendous business and economic impacts. More leading technology companies are taking initiatives to offer big data information services. Alibaba, for instance, has built an open data processing platform, Tianchi, to provide big data generated based on real-world scenarios. This also creates new opportunities for researchers regarding the economics of big data and its applications in P2P lending in several directions.
First, how will the economics of big data extend the economics of information? This paper starts this discussion, but much more work is needed to validate the conjectures presented in this paper. Research questions may be raised such as: Measuring and comparing search cost of traditional bank lending and P2P lending. What is the best way to quantify reduction in search costs? How can we quantitatively judge whether the search cost is reduced by big data analytics? Measuring and comparing signaling cost of traditional bank lending and P2P lending. What is the best way to quantify the reduction in signaling cost? How can we quantitatively judge whether the signaling cost is reduced by big data analytics? How effective is P2P lending in reducing the incidence or extent of moral hazard? Will the use of big data IT technologies cause different kinds of moral hazard? How effective is P2P lending in reducing the incidence and extent of adverse selection? Will the use of big data IT technologies cause different kinds of adverse selection?
Second, big data is valuable, but once produced, they can become a public good just like any other information. In the future, big data analytics will become a competitive advantage. To understand this competitive source, we need to address issues including: The value proposition of big data: How should we evaluate and price the information service of big data analytics? This is especially challenging since big data can become a public good once produced. Nevertheless, it is imperative that the providers of big data be compensated for their efforts. Otherwise, there would not be sufficient incentives to provide such service. The ownership of the data and the privacy issue. In some countries, such as the US, the use of demographic and other forms of data that could reveal age, gender, race, or other protected traits are prohibited in the credit underwriting process. How would these policies influence the economics of big data? Similarly, regulations and laws on big data are being developed, which are vital to the long-term viability of big data analytics. What are the best ways to regulate this new industry to balance economic need and protection of privacy?
Further, there are many issues to be discussed about the long-term economic impacts of P2P lending on the lending business. Additional issues we need to understand include: The application of big data to traditional bank lending. How best can traditional banks transform themselves using this "Online-to-Offline" platform?
How best can big data analytics help improve contract designs for the lending business? How can big data analytics maximize social welfare in the lending business? To what extent are the financing challenges faced by SMEs alleviated by P2P lending? What is the impact on the overall economy?

Conclusion
Big data-based financial innovations have been embraced as a disruptive force that will reshape the financial services sector. In this paper, we take an information economics perspective to investigate how big data affects the transformation of the lending industry. By identifying that signaling and search costs are reduced by the application of big data analytics in P2P lending, we show how big data can reduce information asymmetry in the lending industry. With a proposed theory on the economics of big data, we outline a number of research opportunities and challenging issues. This paper discusses theoretical guidelines for broadening and improving research on P2P lending with a perspective of big data economics, and we plan to conduct empirical studies in our future research to validate what we suggest in this paper. It is hoped that this paper will be a starting point for more research and discussion on big data-based financial innovations.
Endnotes 1 FICO score is a type of credit score widely used in the US. FICO is an acronym for the Fair Isaac Corporation, the creators of the FICO score.