Detecting the lead–lag effect in stock markets: definition, patterns, and investment strategies

Li, Yongli; Wang, Tianchen; Sun, Baiqing; Liu, Chao

doi:10.1186/s40854-022-00356-3

Research
Open access
Published: 20 May 2022

Detecting the lead–lag effect in stock markets: definition, patterns, and investment strategies

Yongli Li ORCID: orcid.org/0000-0002-1979-9057¹,
Tianchen Wang¹,
Baiqing Sun¹ &
…
Chao Liu^1,2

Financial Innovation volume 8, Article number: 51 (2022) Cite this article

15k Accesses
5 Citations
3 Altmetric
Metrics details

Abstract

Human activities widely exhibit a power-law distribution. Considering stock trading as a typical human activity in the financial domain, the first aim of this paper is to validate whether the well-known power-law distribution can be observed in this activity. Interestingly, this paper determines that the number of accumulated lead–lag days between stock pairs meets the power-law distribution in both the U.S. and Chinese stock markets based on 10 years of trading data. Based on this finding this paper adopts the power-law distribution to formally define the lead–lag effect, detect stock pairs with the lead–lag effect, and then design a pure lead–lag investment strategy as well as enhancement investment strategies by integrating the lead–lag strategy into classic alpha-factor strategies. Tests conducted on 20 different alpha-factor strategies demonstrate that both perform better than the selected benchmark strategy and that the lead–lag strategy provides useful signals that significantly improve the performance of basic alpha-factor strategies. Our results therefore indicate that the lead–lag effect may provide effective information for designing more profitable investment strategies.

Introduction

The lead–lag phenomenon, a phenomenon in which a security leads the price movement of another with some time delay, has been empirically evidenced as widely existing in financial markets (Gong et al. 2016). Although the “lead–lag effect” concept has been adopted in many studies (Kobayashi and Takaguchi 2018), few have provided a formal definition of this concept, and its underlying meaning is not always consistent. Some studies have focused on how to generate greater stock returns by utilizing the “lead–lag phenomenon” (Stübinger 2019) but have often failed to mine its embedded features. To this end, this study aims to answer the following questions: (1) Are there several stable patterns in stock markets that are characterized by the lead–lag phenomenon? (2) How can we formally define the lead–lag effect to provide a solid foundation for detecting such an effect? (3) Can detecting the lead–lag effect enable the design of more profitable investment strategies that are more likely to earn excess returns?

The definition of the “lead–lag effect” is not equivalent to that of the “lead–lag relationship.” That is, if one stock’s volatility today mimics another stock’s volatility yesterday, the two stocks are said to have a “lead–lag relationship” over the two successive days in which the former is the follower, and the latter is the leader. In fact, it is quite common for one stock to follow another stock some days during a year. Thus, an occasional lead–lag relationship could be regarded as random, which would not be very meaningful. However, if the lead–lag days of one stock pair are long enough to differ significantly from a random event, an effect can be deemed to exist between the pair. Accordingly, the first motivation of our work is to define the lead–lag effect by providing a statistical testing model, the goal of which is to judge whether the days characterized by a lead–lag relationship (hereafter, “lead–lag days”) are significantly long in statistics.

Once the definition of the lead–lag effect is scientifically determined, a method for detecting stock pairs characterized by the lead–lag effect can be proposed. However, two questions must first be addressed. These are: (1) how do external variables affect the detection results and (2) are the detection results sensitive to these influential external variables? The answers to these two questions will deepen our understanding of the proposed detection model. The patterns of external variables that influence the results will enable us to adopt the proposed model by selecting the appropriate variable values. The robustness of the proposed model is notable for its usage in investment practices in real-world stock markets because a model’s robustness is desirable for designing investment strategies. Accordingly, the answers to these two questions will reveal the properties of the proposed model.

As a typical application, the detected lead–lag effect aims to be adopted in guiding investments in real-world stock markets. Apparently, detecting stock pairs with a significant lead–lag effect can benefit investors because the price movements of followers will mimic those of their leaders. Thus, this study will first examine the performance of the pure lead–lag strategy and then judge if it is satisfactory. If it is satisfactory, we will regard the detected lead–lag effect as an enhancement signal, and then add it to some classic investment strategies to propose enhancement investment strategies. Generally, when a basic strategy is enhanced by another strategy, we refer to it as a single-enhancement investment strategy. The alpha-factor strategy is selected as the basic strategy, and our proposed lead–lag strategy is adopted to enhance it. Accordingly, the third motivation is to design profitable investment strategies based on the detected lead–lag effect, and then test its performance in a pure investment strategy and the proposed enhancement strategies.

To sum up, the contributions of this study are as follows: (1) The features of the lead–lag phenomenon are explored in the context of both the U.S. and Chinese stock markets. As a result, the number of stock pairs characterized by the lead–lag relationship meets the well-known power-law distribution, which offers novel evidence that the power-law distribution exists widely in the real world (Clauset et al. 2009) and specifically in the financial domain (Gabaix et al. 2003). (2) A formal definition of the lead–lag effect is provided according to the principles of statistical testing, and a detection approach is proposed based on this definition. It is worth noting that most existing studies regard the lead–lag relationship between stocks as a phenomenon (Scherbina and Schlusche 2020; Dao et al. 2018; Huth and Abergel 2014), whereas this study elevates this phenomenon into an effect. Accordingly, the lead–lag effect must be formally defined via statistical testing, which lays a foundation for future studies to compare and detect the lead–lag effect in various scenarios. The rationality and robustness of the proposed detection approach are carefully examined by determining how external variables influence the lead–lag effect. (3) A few profitable investment strategies are designed and validated based on the detected lead–lag effect, in parallel to previous design and validation studies such as Shen et al. (2017), Xiong et al. (2020), Flori and Regoli (2021), and Zhang et al. (2021). Here both the pure lead–lag strategy and the enhancement strategies report sound results regarding the functionality of the detected lead–lag effect.

The remainder of this paper is organized as follows. “Section Related work” reviews the related work to clearly delineate the aforementioned contributions; “Section Method for detecting the lead–lag effect” defines the lead–lag effect and proposes a detection methodology; “Section Main results and validation in real-world stock markets” explores the features of the lead–lag phenomenon based on a selected real-world dataset, applies the proposed detection method, and tests the method’s robustness; “Section Investment strategies based on the detected lead–lag effect” designs investment strategies and validates their performance to reveal the functionality of the detected lead–lag effect; and “Section Conclusions and future work” summarizes the study and discusses potential future work.

Related work

Our work is directly related to two fields of existing studies: this includes the lead–lag phenomenon in stock markets and the focused alpha-factor strategy widely adopted in stock markets. Each field is reviewed individually in the following sections.

Lead–lag phenomenon in stock markets

The lead–lag phenomenon is a classic financial topic that has attracted the attention of numerous researchers (Conlon et al. 2018). First, one fundamental question has been widely examined in the literature: does the lead–lag phenomenon exist in the stock market? Generally, the lead–lag phenomenon can be observed in high-frequency data such as 5-min stock price movements. Both Jong and Nijman (1997) and Huth and Abergel (2014) deemed that the lead–lag relationship is an essential stylized fact at high frequencies. Fonseca and Zaatour (2017), Dao et al. (2018), Buccheri et al. (2019), Campajola et al. (2020), and many others do not mention the existence of the lead–lag phenomenon in high-frequency data, the influencing factors, or even its potential origins. However, when observed in high-frequency data, this phenomenon is often called the “lead–lag relationship” rather than the “lead–lag effect”. In most cases, the lead–lag relationship is unstable in high-frequency data. Since, according to Tóth and Kertész (2006) and Curme et al. (2015), its appearance is likely to be occasional, our work aims to formulate a new approach to finding a stable lead–lag relationship over a long time period based on statistical testing and to rename such stable lead–lag relationship “the lead–lag effect” as an indication of its statistical significance.

Second, the existing literature explores how to take advantage of the lead–lag phenomenon in designing investment strategies for real-world stock markets. Typically, investment strategies that utilize the lead–lag phenomenon are often variations on the high-frequency trading strategy, which is in accord with the results. Stübinger (2019) developed an optimal causal path algorithm and designed statistical arbitrage strategies for high-frequency data based on the lead–lag phenomenon. However, designing an investment strategy based on high-frequency data still has drawbacks. According to Krauss (2017), high-frequency trading strategies are associated with greater commission fees and a higher transaction threshold for investors.

In contrast, the stable lead–lag effect discovered in low-frequency data facilitates the practices of small and medium investors because of its ample optional trading time and low technical threshold. Scherbina and Schlusche (2020) and Gupta and Chatterjee (2020) have pointed out that the lead–lag relationship enables out-of-sample forecasting and thus helps in the design of investment strategies. From this perspective, this study can also be seen as the development of our previously published work (Li et al., 2021), which focuses on identifying the factors that cause the lead–lag phenomenon. However, this study aims to develop new investment strategies by utilizing the lead–lag phenomenon, and thus the two have divergent research aims. In contrast to the successive lead–lag days analyzed in our previously published work, this study considers the number of cumulative lead–lag days that would benefit extending the application of the model in real-world stock markets.

According to the aforementioned literature and gaps in the existing research, we believe that it is meaningful and even necessary to study the lead–lag effect in low-frequency data for the following reasons: (1) the definition of the lead–lag effect is not unified or discussed in depth in the existing literature, and thus the underlying significance of the lead–lag phenomenon often differs despite their use of the same name; (2) traditional studies on detecting the lead–lag phenomenon are conducted using classical econometrics or empirical research methods, and thus the use of data-driven technical analysis to detect the lead–lag effect can supplement existing studies with a new perspective; and (3) building on the traditional methods of designing investment strategies by using the discovered lead–lag phenomenon, our study may identify effective signals, which will have a guiding significance for the development of investment strategy. Accordingly, this study contributes to the literature by providing a unified and solid definition of the lead–lag effect and by utilizing the lead–lag effect to design profitable trading strategies in real-world stock markets.

Alpha-factor strategy

Concerning our targeted enhancement investment strategies, the alpha-factor strategy that originated with the capital asset pricing model is chosen as the primary strategy due to its popularity and effectiveness in real-world investment practices (Sharpe 1964; Makarov and Plantin 2015). The alpha factor in the alpha-factor strategy reacts to one or some stock attributes; in other words, different alpha factors reflect different stock attributes. Thus, the alpha-factor strategy consists of numerous specific strategies using various alpha factors. Since alpha factors are used as buying and selling signals in the alpha-factor strategy, its choice is the core of the strategy. Generally, existing studies focus mainly on the following two types of alpha factors: value alphas and transactional alphas.

Value alphas are derived from the fundamentals of one stock and describe its value attributes. Value alphas include but are not limited to value factors (Balatti et al. 2017; Eisdorfer et al. 2019), size factors (Liu et al. 2019), growth factors (Fama and French 1998), profitability factors (Hou et al., 2015; Fama and French 2015), and momentum factors (Fama and French 2012; Berggrun et al. 2020). Based on the mature factor model, value alphas provide not only a valuable tool for stock valuation, but also a reasonable explanation for the cross-section of stock returns (Harvey et al. 2016). Accordingly, when value alphas are adopted in a strategy, it indicates that the investor cares about the value investment’s underlying factors (Fama and French 2016). In contrast to traditional value alphas, transactional alphas pay more attention to the patterns embedded in trading behaviors (Casgrain and Jaimungal 2019). Transactional alphas are obtained by means of technical analysis and derived from transaction data. With the current progression of computer science, millions of transactional alpha factors have been identified by automated algorithms. Despite the lack of a good explanation, the marginal revenue contributed by transactional alpha factors is relatively satisfactory (Kakushadze 2016); large financial institutions favor such transactional alphas. For example, the 101 alpha factors proposed by the World Quant and the 191 alpha factors from Guotai Junan Securities have been welcomed by many institutions and investors.

The alpha-factor strategy is always used for stock selection. The proposed lead–lag strategy in our work helps allocate the weight of each selected stock in an investment portfolio. Therefore, it is convenient to combine the two strategies when designing an enhancement strategy. Since the alpha-factor strategy includes numerous specific models with various alpha factors, we select it as the primary strategy to demonstrate representativeness. The lead–lag effect falls into the category of technology-driven analysis and therefore resonates with transactional alphas, which are determined by technical analysis.

For this reason, it would be more natural to combine the lead–lag trading strategy with transactional alphas. Accordingly, this study focuses mainly on transactional alpha-factor strategies, regarded as the basic strategies when designing enhancement investment strategies. Our work exploits the great potential of the existing alpha factors and provides a framework for enhancement strategies by integrating the lead–lag effect into the existing alpha-factor strategies.

Method for detecting the lead–lag effect

The daily lead–lag network

Let r_i,t denote the yield rate of stock i on day t. Its mathematical expression is as follows:

$$r_{i,t} = \frac{{p_{i,t} - p_{i,t - 1} }}{{p_{i,t - 1} }},$$

(1)

where p_i,t denotes the closing price of stock i on day t. Here, the adopted stock price is restoring the right price rather than the price of ex-rights. If a suspension occurs for stock i on day t, then both r_i,t and r_i,t+1 are set to “NAN.” Next, given a manufactured threshold Δ (0 ≤ Δ < 1), the definition that stock j follows stock i on day t is defined as follows:

Definition 1

The conditions for forming a lead–lag link. If and only if the following condition holds:

$$\left\{ {\begin{array}{*{20}c} {(1 - \Delta )r_{j,t - 1} \le r_{i,t} \le (1 + \Delta )r_{j,t - 1} ,{\text{ when }}r_{i,t} \ge 0;} \\ {(1 + \Delta )r_{j,t - 1} \le r_{i,t} \le (1 - \Delta )r_{j,t - 1} ,{\text{ when }}r_{i,t} < 0.} \\ \end{array} } \right.$$

(2)

then, stock i follows stock j on day t.

Definition 1 states that if the difference between the yield rate of stock i on day t and that of stock j on day t–1 is within the given threshold, Δ, stock i is judged to follow stock j on day t. Further, let G_t denote the lead–lag network on day t, and its element g_ij,t reflects the status of stock i following stock j on day t. If stock i is judged as the follower of stock j on day t according to Definition 1, then g_ij,t = 1; otherwise, g_ij,t = 0. Our model allows one stock to follow itself, and thus it is possible that g_ii,t = 1 holds. Then, given the closing prices of all concerned stocks during the sequential T + 1 trading days, we can achieve T lead–lag networks according to Definition 1.

During the targeted period (e.g., the total number of T + 1 trading days), the achieved T lead–lag networks can tell us how many days stock i follows stock j in total. Formally, let d_ij denote the number of accumulated days that stock i follows stock j during the targeted period, which can be calculated as follows:

$$d_{ij} = \sum\limits_{t = 1}^{T} {g_{ij,t} } .$$

(3)

G_t is an asymmetrical matrix in most cases, considering that d_ij is not often equal to d_ji.

Concerning the manufactured threshold Δ, a larger Δ will cause the achieved daily lead–lag networks to have more directed links than a smaller Δ, thus the threshold Δ affects network density. Because it is an artificial variable, we will explore how it affects the results and check whether our method is robust under different threshold values in Sect. 4.2.1. The mainstream literature defining the relationship between stock pairs, such as Huang et al. (2009), Kumar and Deo (2012), Peralta and Zareei (2016), Xia et al. (2018), Deev and Lyócsa (2020), and many others, has often adopted the correlation coefficient. Note that most existing literature related to this definition uses data from a defined period to calculate the so-called “correlation coefficient,” whereas our study uses daily data to define each day’s lead–lag relationship between stock pairs. Therefore, the novel idea of using the selected data (i.e., “daily usage to define the lead–lag relationship” or “usage together to calculate a correlation coefficient during the selected period”) leads to one of the differences between our study and the existing literature.

Definition and detection of the lead–lag effect

As explained in the introduction, when d_ij (which is defined in Eq. (3)) is long enough to be significantly distinct from the amount achieved in a random event, we tend to believe that the lead–lag effect from stock j to stock i holds, where stock j is the leader and stock i is the follower. However, the criterion for judging whether d_ij is sufficiently long or not should be determined before formally defining the lead–lag effect. Fortunately, statistical testing enables us to formulate the following criterion: the null hypothesis is set to “all the links in the daily lead–lag networks are randomly formed,” the null hypothesis will allow us to obtain the distribution of the accumulated days of all stock pairs. Then, given the statistical significance level (e.g., 0.10, 0.05, 0.01, etc.), the criterion can be immediately achieved in the obtained distribution. To clarify, let $\hat{d}$ denote the criterion. The meaning of the term “lead–lag effect” is provided in Definition 2.

Definition 2

Lead–lag effect. Based on the calculated $\hat{d}$, for any pair of stocks (e.g., i and j), if the d_ij achieved from Eq. (3) satisfies $d_{ij} \ge \hat{d}$, the lead–lag effect from stock j to stock i is judged to hold. If $d_{ii} \ge \hat{d}$, the lead–lag effect from stock i to itself is judged to hold.

Note that the principles of statistical testing imply that it is almost impossible for a rare event to occur in one random trial. Given the null hypothesis and the statistical significance level, the criterion for judging whether an event is rare or not can be achieved. Then, if a rare event occurs in the analyzed real-world data, we can reject the null hypothesis under the given statistical significance level, or we can determine that the rare event has a statistically significant effect. In fact, few studies have formally defined the lead–lag effect. As mentioned in the Related Work section, the lead–lag phenomenon or relationship was more often examined in the existing literature rather than the lead–lag effect. We detected the lead–lag effect via formal statistical tests and null reference networks, but the existing literature adopted different approaches to detecting the lead–lag relationship, such as the Granger test (Scherbina and Schlusche 2020; Zeng and Atta Mills 2021) and the optimal causal path algorithm (Jiang et al. 2019). Accordingly, the approach adopted leads to different definitions, so our definition is new in this field.

Table 1 shows the detailed process of achieving criterion $\hat{d}$. Random networks are first generated to achieve the distribution of the accumulated days between all stock pairs under the null hypothesis. Then, criterion $\hat{d}$ can be obtained given the statistical significance level in Step (1). Here, we refer to the configuration model proposed by Newman et al. (2001). The generated random networks retain the characteristics of the daily network as much as possible. Although the network indicator of the real-world lead–lag network changes each day, the adopted configuration model guarantees that each day’s random network and the same day’s real-world network share an almost identical node degree distribution, which is superior to the model that retains only the same edge number. Next, the statistical significance level δ is set to 0.001 since a lower significance level means a more rigorous criterion for determining the lead–lag effect. Once output $\hat{d}$ is achieved based on the process shown in Table 1, Definition 2 directly judges which stock pair features the lead–lag effect. Hereafter, the stock pairs detected with the lead–lag effect are called “lead–lag stock pairs.”

Table 1 The process of achieving the criterion $\hat{d}$

Detecting the lead–lag effect in stock markets: definition, patterns, and investment strategies

Abstract

Introduction

Related work

Lead–lag phenomenon in stock markets

Alpha-factor strategy

Method for detecting the lead–lag effect

The daily lead–lag network

Definition 1

Definition and detection of the lead–lag effect

Definition 2

Example

Main results and validation in real-world stock markets

Data preparation and main statistical results

Data preparation

Power-law distribution

Main results and validation

Detection results as a function of Δ

Detection results as a function of ζ

Investment strategies based on the detected lead–lag effect

Pure lead–lag strategy

Enhancement strategy

Performance

Discussion

Conclusions and future work

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix 1: The designed 20 alpha factors and their expressions

Appendix 2: List of ababreviations

Rights and permissions

About this article

Cite this article

Share this article

Keywords