Open Access

Baidu index and predictability of Chinese stock returns

Financial Innovation20173:4

DOI: 10.1186/s40854-017-0053-1

Received: 5 September 2016

Accepted: 13 March 2017

Published: 24 March 2017


A number of studies have investigated the predictability of Chinese stock returns with economic variables. Given the newly emerged dataset from the Internet, this paper investigates whether the Baidu Index can be employed to predict Chinese stock returns. The empirical results show that 1) the Search Frequency of Baidu Index (SFBI) can predict next day’s price changes; 2) the stock prices go up when individual investors pay less attention to the stocks and go down when individual investors pay more attention to the stocks; 3) the trading strategy constructed by shorting on the most SFBI and longing on the least SFBI outperforms the corresponding market index returns without consideration of the transaction costs. These results complement the existing literature on the predictability of Chinese stock returns and have potential implications for asset pricing and risk management.


Stock return predictability Baidu index Trading strategy Financial Big data analytics Chinese stock market Investor inattention


The predictability of stock returns has long been the focus in financial economics with both cross-sectional and time series analysis. Voluminous variables have been put forward to predict the stock returns, including book-to-market ratio (Pontiff and Schall, 1998), inflation rate (Nelson, 1976), dividend-price ratio (Fama and French, 1988) and term structure of the interest rates (Campbell, 1987), etc. As for the predictability of Chinese stock market, Lee and Rui (2000) show that US stock return helps predict returns of Shanghai A and B stocks. Chen et al. (2010) document that only 5 out of 18 firm-specific variables can predict returns in Chinese stock market and attribute this weak predictability to the low informativeness and less heterogeneously distributed of stock returns. Goh et al. (2013) find that US economic variables have statistically significant predictive power for Chinese stock returns after China entering into the WTO. Specifically, the joint variables of China and US economic variables have superior predictive power. Jordan et al. (2014) find that the returns of economically-linked economies can predict the aggregate Chinese stock market. Chen et al. (2016) show that the U.S. economic variables can strongly forecast the monthly volatility of Chinese stock returns.

Recently, scholars begin to employ the information extracted from Internet as the new source of information for financial studies, including the online stock message boards (Tumarkin and Whitelaw, 2001; Antweiler and Frank, 2004), Twitter (Bollen et al., 2011; Zhang et al., 2016a), Google Trends (Da et al., 2011; Da et al., 2015), Baidu Index (Zhang et al., 2013), Baidu News (Zhang et al., 2014; Shen et al., 2016), online stock commentary column (Zhang et al., 2016c) and Sina Weibo (Jin et al., 2016). In particular, Da et al. (2011) show that the search frequency from Google Trends can predict stock returns in the next weeks. Joseph et al. (2011) consider the online ticker search as the proxy for investor sentiment and find that the sentiment can predict abnormal stock return and trading volume at weekly horizon. Dimpfl and Jank (2016) find that the internet search queries can predict next day’s volatility.

This paper connects with the above-mentioned two streams of literature and contributes to the existing literature in two aspects. On the one hand, we give the first empirical study of the predictability of Baidu Index, the newly emerged internet information, on Chinese stock returns, while other studies mainly focus on the market variables (Goh et al., 2013; Chen et al., 2010; Chen et al., 2016). Although Zhang et al. (2013) advocate the search frequency of stock name in Baidu Index as the proxy for investor attention, they only focus on the explanatory power of this proxy on stock return and do not investigate its predictability. On the other hand, we complement the existing studies on Chinese stock returns (Goh et al., 2013; Jordan et al., 2014) in the sense that our predictive variable is at daily horizon. The causes of the changes in daily returns, volatility of trading volume is significantly different from the monthly movement of the market performance (Admati and Pfleiderer, 1988; Amihud and Mendelson, 1987). Therefore, we can gain more insights into the pricing mechanism and our study has potential important implications for pricing mechanisms, asset allocation, and risk management.

The remainder of this paper is organized as follows. "Data description" section describes the Baidu Index and capital data. "Empirical analysis" section performs the empirical analysis, including the lead-lag relationships, the cross-sectional analysis and the trading strategy. "Conclusions" section sets forth the conclusions.


The sample focuses on both the Shanghai Stock Exchange and the Shenzhen Stock Exchange with 30 stocks in each board respectively, i.e., the ChiNext, the SME Board and the Main Board, covering the calendar days from March 1st 2011 to March 30th 2012.1 Totally, there are 90 stocks with 267 trading days. In particular, we obtain the daily trading volume, daily stock return rate after dividend reinvestment and the market return from the China Stock Market and Accounting Research (CSMAR) database. The 1-min prices are retrieved from the RESSET Database to calculate the intraday volatility. In a pooled analysis, the daily trading volume ranges from 33700 to 1.5*108, the firm capitalization ranges from 6.5*105 to 5.8*109, the PE ratio ranges from −667 to 3119 and the turnover ranges from 0.0398 to 58.94. Due to the diversified distribution in different boards as well as the distinct characteristic of the stocks, our sample can be viewed as a parsimonious representation of Chinese stock market.

Baidu index

Baidu Index is a keyword-searching tool launched by Chinese largest search engine and its main customers are Chinese language users. We search the stock name in Baidu Index and record the Search Frequency of Baidu Index (SFBI) for each stock during the sample period. To make the SFBI comparable across firms, we calculate the standardized SFBI (SSFBI) for each individual stock.
$$ SSFB{I}_t=\frac{ S FB{I}_t- A{V}_{S FB I}}{S{ D}_{S FB I}} $$

where AV SFBI is the average value in the sample period and SD SFBI is the standard deviation of the SFBI time series.


We calculate the absolute value of the difference between individual stock return and market return as the abnormal return (AbRet). This measurement captures absolute price changes, rather other the upward and downward directions. In a similar way, we also calculate the cumulative abnormal return (CAR) for future analysis.
$$ AbRe{t}_t=\left| R e{t}_{I, t}- R e{t}_{M, t}\right| $$
$$ CAR\left(-30,+30\right)={\displaystyle {\sum}_{t=-30}^{+30}\left( Re{t}_{I, t}- Re{t}_{M, t}\right)} $$

where Ret I,t is the daily stock return rate after dividend reinvestment and Ret M,t is the corresponding market index return. In particular, we choose the return of Chinese Stock Index 300 (CSI 300) as the market return for the reason that the CSI 300 is the first index launched by both the Shanghai and Shenzhen Stock Exchanges and thus it represents the whole market.

Following Barber and Odean (2008), we calculate the excess trading volume (ETV) for each stock as the daily ratio of the stock’s trading volume that day to its average trading volume in the whole sample period.
$$ E T{V}_t=\frac{T{V}_t}{A{V}_{T V}} $$

where AVTV is the daily trading volume on each trading day and AV TV is the average trading volume in the whole sample period.

The intraday volatility is calculated as the standard deviation of the 1-min prices.
$$ Volatilit{y}_t=\sqrt{\frac{{\displaystyle {\sum}_1^N}{\left({P}_t- A{V}_P\right)}^2}{N}} $$

where Pt is the 1-min stock price and AVP is the average stock price. N denotes the number of observation and N = 240.

Results and discussion

This section provides empirical analysis on the lead-lag relationships between SFBI and stock returns, the market reactions around the MSD and LSD events as well as the performance of the trading strategy.

Lead-lag relationships

In line with other recent studies on the lead-lag relations (Schmeling, 2009 and Siganos et al., 2014), we use five lags for SFBI and stock return as the explanatory variables and formulate the following ordinary regression model.
$$ AbRe{t}_{i, t}={\alpha}_1+{\displaystyle {\sum}_{j=1}^5{\beta}_{i, j} SSFB{I}_{i, t- j}+}{\displaystyle {\sum}_{j=1}^5{\gamma}_{i, j} AbRe{t}_{i, t- j}+{\varepsilon}_{i, t}} $$
Table 1 reports the regression results of model (6). Since we employ the stock by stock regression, the Bonferroni correction method is used to correct the p-values for the multiple comparisons problem. We observe that the one lagged SFBI is significantly related to next trading day’s abnormal return with positive correlation coefficient across different subsamples. These findings suggest that the online search behavior from individual investors can predict price changes in the next trading day.
Table 1

Summary of the regression results This table reports the regression results of model (6). As the analysis is based on the stock by stock regression, we employ the Bonferroni correction method to correct the p-values for the multiple comparisons problem. The significance cut-off is set to α/n (α = 0.05 and n = 90). Since not all the variables are standardized, the reported coefficients are not the basis points but depend on the scale of the variables



Main board

SME board

Full sample






SSFBI t − 1





SSFBI t − 2





SSFBI t − 3





SSFBI t − 4





SSFBI t − 5





AbRet t − 1





AbRet t − 2





AbRet t − 3





AbRet t − 4





AbRet t − 5





Note: adenotes the significant at 1% level


To give a meticulous observation of the market reaction, for each individual stock, we sort the trading days based on the corresponding SFBI and select the highest 10 trading days as the Most Searching Days (MSD) and the lowest 10 trading days as the Least Searching Days (LSD). We can then consider the MSD and LSD as event day and observe the market reactions around MSD and LSD with the event study methodology. Figure 1 illustrates the CAR, ETV and Volatility around the MSD and LSD. In particular, Panel A plots the changes of CAR around the MSD and LSD, we find that the stock price goes down after the MSD and goes up after the LSD. These results are inconsistent with the findings in the US stock market (Da et al., 2011), which support higher search volume predicts future higher stock return. This inconsistence may be driven by the “big position construction” by institutional investors. After constructing the position for certain stocks, the institutional investors release some news about the companies and attract the individual investors to buy these stocks. Meanwhile, the institutional investors sell out their holdings to gain profit. This argument reconciles with report conducted by the Shanghai Stock Exchange showing that the individual investors account for 93.20% in A shares at the end of 2012 as well as the some scholars claiming that Chinese stock market are dominated by irrational individual investors who are subject to strong psychological bias and thus resulting in speculation (Feng and Seasholes, 2008 and Zhang et al., 2016b). Besides, Fig. 1 also show that there are significantly larger ETV and Volatility on the MSD.
Fig. 1

CAR, ETV and Volatility around the MSD and LSD. Panel a: CAR. Panel b: ETV. Panel c: Volatility

Trading strategy

In this section, we further investigate the economic significance of our findings by constructing a long-short trading strategy based on the SFBI. The long-short trading strategy is formed as follows: on each trading day, the 90 firms are sorted into quantiles (Q) based on the SFBI in previous trading day. Q1 contains the firms with the least SFBI and Q4 contains the firms with the most SFBI. The firms are held in their portfolios for the entire sample period with different holding periods, e.g., 20, 40, 60, 80, 100 and 120 trading days, respectively. We then obtain a portfolios that consists of a long position in the least quantile of firms (Q1) and a short position in the most quantile of firms (Q4), i.e., the returns of the portfolio are the Q1 minus Q4. Figure 2 illustrates the cumulative profit with different holding periods. As is plotted, this trading strategy has positive return for all the holding periods. Specifically, Fig. 3 illustrate the strategy with 120 holding period. Panel A of Fig. 3 plots the returns of the Q1 and Q4, we can find that the return is Q1 is significantly larger than that of the Q4 at 1% level with p-value = 0.0017. The blue line in Panel B of Fig. 3 is the difference between the Q1 and Q4 and the red line is the corresponding market returns. We can find that our trading strategy outperforms the market returns significantly at 1% level with p-value = 0.0000.
Fig. 2

Cumulative Holding Period Returns

Fig. 3

An Illustration of Cumulative Returns in Holding Period with 120 Trading Days. The left subfigure of figure 3 is Panel a; The right subfigure of figure 3 is Panel b


This paper employs the Baidu Index as a predictive variable and investigates its predictability for Chinese stock returns. The empirical findings show that Baidu Index can predict the price changes on the next trading day. After constructing the MSD and LSD, we mainly find that the stock prices go up when individual investors pay less attention to the stocks and go down when individual investors pay more attention to the stocks. Besides, we also construct a trading strategy by shorting on the most SFBI and longing on the least SFBI. The trading strategy outperforms the corresponding market index returns. However, we must caution scholars in adopting our trading strategy in real investment. Because the transaction costs associated with the rebalancing are not considered.


Baidu Index does not provide the data anymore. Therefore, the data used in this paper is from Zhang et al. (2013), but it addresses different research questions. We believe that this paper has some managerial implications for practitioners who may have private access to the Baidu Index.




This work is supported by the National Natural Science Foundation of China (71320107003 and 71532009).

Authors’ contributions

DS and YZ contributed to study design, data collection and provided the first draft of the manuscript. DS, WZ, YZ and XX participated in the empirical analysis, interpretation of the results, final draft and proof reading of this manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

College of Management and Economics, Tianjin University
China Center for Social Computing and Analytics (CCSCA), Tianjin University
Key Laboratory of Computation and Analytics of Complex Management Systems (CACMS)


  1. Admati AR, Pfleiderer P (1988) A Theory of Intraday Patterns: Volume and Price Variability. Rev Financ Stud 1(1):3–40View ArticleGoogle Scholar
  2. Amihud Y, Mendelson H (1987) Trading Mechanisms and Stock Returns: An Empirical Investigation. J Financ 42(3):533–53View ArticleGoogle Scholar
  3. Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Financ 59:1259–1294View ArticleGoogle Scholar
  4. Barber BM, Odean T (2008) All That Glitters: The Effect of Attention and News on the Buying Behavior of Individual and Institutional Investors. Rev Financ Stud 21:785–818View ArticleGoogle Scholar
  5. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2:1–8View ArticleGoogle Scholar
  6. Campbell JY (1987) Stock returns and the term structure. J Financ Econ 18:373–399View ArticleGoogle Scholar
  7. Chen X, Kim KA, Yao T, Yu T (2010) On the predictability of Chinese stock returns. Pac Basin Financ J 18(4):403–25View ArticleGoogle Scholar
  8. Chen J, Jiang F, Li H, Xu W (2016) Chinese stock market volatility and the role of U.S. economic variables. Pac Basin Financ J 39:70–83View ArticleGoogle Scholar
  9. Da Z, Engelberg J, Gao P (2011) In Search of Attention. J Financ 66:1461–1499View ArticleGoogle Scholar
  10. Da Z, Engelberg J, Gao P (2015) The Sum of All FEARS Investor Sentiment and Asset Prices. Rev Financ Stud 28:1–32View ArticleGoogle Scholar
  11. Dimpfl T, Jank S (2016) Can Internet Search Queries Help to Predict Stock Market Volatility? Eur Financ Manag 22:171–192View ArticleGoogle Scholar
  12. Fama EF, French KR (1988) Dividend yields and expected stock returns. J Financ Econ 22:3–25View ArticleGoogle Scholar
  13. Feng L, Seasholes MS (2008) Individual investors and gender similarities in an emerging stock market. Pac Basin Financ J 16:44–60View ArticleGoogle Scholar
  14. Goh JC, Jiang F, Tu J, Wang Y (2013) Can US economic variables predict the Chinese stock market? Pac Basin Financ J 22:69–87View ArticleGoogle Scholar
  15. Jin X, Shen D, Zhang W (2016) Has microblogging changed stock market behavior? Evidence from China. Physica A 452:151–156View ArticleGoogle Scholar
  16. Jordan SJ, Vivian A, Wohar ME (2014) Sticky prices or economically-linked economies: The case of forecasting the Chinese stock market. J Int Money Financ 41:95–109View ArticleGoogle Scholar
  17. Joseph K, Babajide Wintoki M, Zhang Z (2011) Forecasting abnormal stock returns and trading volume using investor sentiment: Evidence from online search. Int J Forecast 27:1116–1127View ArticleGoogle Scholar
  18. Lee CF, Rui OM (2000) Does trading volume contain information to predict stock returns? Evidence from China’s stock markets. Rev Quant Finan Acc 14:341–360View ArticleGoogle Scholar
  19. Nelson CR (1976) Inflation and rates of return on common stocks. J Financ 31:471–483View ArticleGoogle Scholar
  20. Pontiff J, Schall LD (1998) Book-to-market ratios as predictors of market returns1. J Financ Econ 49:141–160View ArticleGoogle Scholar
  21. Schmeling M (2009) Investor sentiment and stock returns: Some international evidence. J Empir Financ 16:394–408View ArticleGoogle Scholar
  22. Shen D, Zhang W, Xiong X, Li X, Zhang Y (2016) Trading and non-trading period Internet information flow and intraday return volatility. Physica A 451:519–524View ArticleGoogle Scholar
  23. Siganos A, Vagenas-Nanos E, Verwijmeren P (2014) Facebook’s daily sentiment and international stock markets. J Econ Behav Organ 107(Part B):730–743View ArticleGoogle Scholar
  24. Tumarkin R, Whitelaw RF (2001) News or noise? Internet postings and stock prices. Financ Anal J 57:41–51View ArticleGoogle Scholar
  25. Zhang W, Shen D, Zhang Y, Xiong X (2013) Open source information, investor attention, and asset pricing. Econ Model 33:613–619View ArticleGoogle Scholar
  26. Zhang Y, Feng L, Jin X, Shen D, Xiong X, Zhang W (2014) Internet information arrival and volatility of SME PRICE INDEX. Physica A 399:70–74View ArticleGoogle Scholar
  27. Zhang W, Li X, Shen D, Teglio A (2016a) Daily happiness and stock returns: Some international evidence. Physica A 460:201–209View ArticleGoogle Scholar
  28. Zhang W, Li X, Shen D, Teglio A (2016b) R2 and idiosyncratic volatility: Which captures the firm-specific return variation? Econ Model 55:298–304View ArticleGoogle Scholar
  29. Zhang Y, Song W, Shen D, Zhang W (2016c) Market reaction to internet news: Information diffusion and price pressure. Econ Model 56:43–49View ArticleGoogle Scholar


© The Author(s). 2017