 Research
 Open Access
 Published:
Improvement in Hurst exponent estimation and its application to financial markets
Financial Innovation volume 8, Article number: 86 (2022)
Abstract
This research aims to improve the efficiency in estimating the Hurst exponent in financial time series. A new procedure is developed based on equality in distribution and is applicable to the estimation methods of the Hurst exponent. We show how to use this new procedure with three of the most popular algorithms (generalized Hurst exponet, total triangles area, and fractal dimension) in the literature. Findings show that this new approach improves the accuracy of the original methods, mainly for longer series. The second contribution of this study is that we show how to use this methodology to test whether the series is selfsimilar, constructing a confidence interval for the Hurst exponent for which the series satisfies this property. Finally, we present an empirical application of this new procedure to stocks of the S &P500 index. Similar to previous contributions, we consider this to be relevant to financial literature, as it helps to avoid inappropriate interpretations of market efficiency that can lead to erroneous decisions not only by market participants but also by policymakers.
Introduction
For many economists, a fundamental assumption of financial assets is that price changes are generated randomly without longterm memory. However, this theory was previously challenged by the pioneering study of Mandelbrot (1963) and today, price series persistence behavior is a welldocumented property of many financial series (Mantegna and Stanley 1996, 2000; Carbone et al. 2004). This property, known as long memory, caught the attention of many econometricians who introduced the ARCH model (Engle 1982), the GARCH model (Bollerslev 1986), and many others for the longrange autocorrelation of financial data.
For empirical finance, the presence of long memory in a market, stock, or index is relevant because of its incompatibility with the efficient market hypothesis (EMH), which states that price changes must be unpredictable. Since Fama (1970) established the three forms of efficiency (weak, semi strong, and strong), different methodologies have been used to test the efficiency hypothesis. Technical analysis has been tested for its capability of providing abnormal returns to investors. Others discussed the statistical implication of this hypothesis, that is, that stock returns follow a random path. Physicists have provided a completely different perspective through the introduction of the Hurst exponent to study market efficiency (Beben and Orłowski 2001; Di Matteo et al. 2005; Zunino et al. 2007; Cajueiro and Tabak 2005; Kristoufek and Vosvrda 2014; SánchezGranero et al. 2020; Balladares et al. 2021).
Hurst (1951) attempted to optimize the storage capacity of a reservoir intended to regulate the natural contributions of the Nile River. Thus, the Hurst exponent is developed, denoted by H, and quantifies if a time series is uncorrelated (\(H=0{.}5\)), persistent (\(H>0{.}5\)), or antipersistent (\(H<0{.}5\)). The Hurst exponent has been used in many applications to study the EMH range from stock indices (Matos et al. 2008) to commodities (Tiwari et al. 2021; Kristoufek 2019), bonds (Bariviera et al. 2012), currencies (Shahzad et al. 2018), and cryptocurrencies (Dimitrova et al. 2019; Kristoufek and Vosvrda 2019).
However, related literature has shown several problems related to the use of the Hurst exponent. The first issue is related to the lack of precision when the length of the time series is too short, as is the case with financial time series (SánchezGranero et al. 2008; Weron 2002; Willinger et al. 1999; Couillard and Davison 2005).
Another recent problem (Mercik et al. 2003; Barunik and Kristoufek 2010; FernándezMartínez et al. 2013; Sánchez et al. 2015) is that classical selfsimilarity estimators, such as Rescaled Range analysis (R/S) and Detrended Fluctuation Analysis (DFA), are not valid when the underlying distribution has a heavy tail, which is mainly used to model stock market returns [some examples are Bae et al. (2020), Zhaoa et al. (2021), Ciner (2021)].
These are the most important reasons why [see for example SánchezGranero et al. (2008), Couillard and Davison (2005)] how an inappropriate use of the Hurst exponent can lead to erroneous conclusions about the existence of market efficiency, which can then lead to erroneous decisions by individual investors, institutional investors, and even policymakers.
This research intends to continue related literature (Kou et al. 2022, 2019; Li et al. 2021) where novel approaches have been introduced in different fields of economics and finance. We present an improvement in not only the accuracy of longmemory algorithms but also in the robustness of the results from its application. A new procedure called the KolmogorovSmirnov (KS) method is proposed, based on the KolmogorovSmirnov statistic, which can be applied to any Hurst exponent estimation method based on equality in distribution. Findings show that this new procedure is extremely precise and less volatile than the original methods, mainly for long series.
However, although financial literature accepts that logarithmic price series are selfsimilar, this study provides a methodology showing that pure price series may be generally but not always necessarily selfsimilar. Therefore, we provide a simple but powerful method to check the selfsimilarity condition. Note that knowing that a time series is selfsimilar is highly recommended before making any estimations of the Hurst exponent; otherwise, the estimation is meaningless. To conclude, we study the selfsimilarity properties of the stocks in the S &P500 index and illustrate how to construct a confidence interval for the values of the Hurst exponent for which the series has selfsimilar properties, which can show whether an estimation is sensible or not.
Revisiting Hurst exponent estimation methods
Since the introduction of R/S analysis (Hurst 1951), many methods have been developed for the estimation of the Hurst exponent. The most populars are R/S and the DFA (Peng et al. 1994). The R/S algorithm takes the range as a measure of dispersion, because it follows a scaling law that allows for the estimation of the Hurst exponent. The DFA algorithm, initially introduced for DNA sequences, quantifies longrange correlations in nonstationary time series. The properties of these methods for estimating the Hurst exponent using Monte Carlo simulations have been reviewed (Barunik and Kristoufek 2010) and compared with others, such as the Multifractal Detrended Fluctuation Analysis (Kantelhardt et al. 2002), Generalized Hurst exponente (GHE) (Di Matteo et al. 2003), and Detrending Moving Average (Alessio et al. 2002).
Other interesting proposals are Hudak’s Semiparametric Method (Geweke and PorterHudak 1983), which is based on the simple linear regression of the logarithmic periodogram on a deterministic regressor, the Quasi Maximum Likelihood analysis (Haslett and Raftery 1989), Periodogram Method (Taqqu et al. 1995), Wavelet Methods (Veitch and Abry 1999), Higuchi Method (Higuchi 1988), Whittle Estimator (Robinson 1995), Centered Moving Average (Alessio et al. 2002), Lyapunov exponent (Bensaida 2014; Das and Das 2006), and the higherdimensional extension of R/S (AlvarezRamirez et al. 2008).
Although the applications of the Hurst exponent were initially widespread in economics (Diebold and Rudebusch 1989; Baillie et al. 1995; Hassler and Wolters 1995; Backus and Zin 1993), financial markets have undoubtedly been the field of greatest research interest (apart from the works mentioned in the introduction, Couillard and Davison (2005) and Dimitrova et al. (2019) contain an interesting summary of the main contributions). The reason is that the existence of memory in price series is considered evidence against the EMH. Determining whether or not financial markets are efficient in several of the ways established by Fama (1970) is one of the most interesting fields of research in finance.
However, first, Willinger et al. (1999) and then others (SánchezGranero et al. 2008; Weron 2002; Couillard and Davison 2005) revealed a lack of precision of classical algorithms when using financial series, and thus the algorithms to be used when adapting and refining with financial data. In this line, relevant contributions are the geometric method (GM) (SánchezGranero et al. 2008), fractal dimension approach (FD) (FernándezMartínez et al. 2014), total triangles area algorithm (TTA) (Lotfalinezhad and Maleki 2020), triangle area algorithm (TA) (GómezÁguila and SánchezGranero 2021), R/S_Com approach (Luo and Huang 2018), and the Bayesian approach (Wan et al. 2022), among others.
The methodology presented in this study aims to improve the efficiency of the algorithms used in the case of financial series.
Mathematical background
First, we recall results, properties, and definitions from the Theory of Probability and Stochastic Processes that are to be used to mathematically formalize several concepts and ideas. The new algorithm to calculate the Hurst exponent is to be tested on selfsimilar processes with stationary increments, and thus we need to know their theories.
Let \((\Omega , \mathcal {A},P)\) be a probability space. A random variable is defined as a measurable function \(X:\Omega \rightarrow \mathbb {R}\) (see Micheas 2018). This means that for every Borel set \(B \subset \mathbb {R}\), \(X^{1}(B) \in \mathcal {A}\). Let T be an arbitrary set, a stochastic process \(\{X(t,\omega ):t\in T,w \in \Omega \}\) is a collection of random variables in the same probability space. In our case, we take the index T as \(\mathbb {R}^{+}_0\). The function \(t \rightarrow X(t,\omega )\) is called the sample path of the stochastic process X corresponding to the outcome \(\omega\).
In the same way that we can describe a random variable by its distribution, a stochastic process can be described by its finite joint distributions. Let \(\{X(t,\omega ):t\ge 0,w \in \Omega \}\) be a stochastic process and take \(0 \le t_1,t_2, \ldots ,t_n < \infty\). Then for each Borel set \(B \subset \mathcal {B}( \mathbb {R}^n)\) and each \(\omega \in \Omega\), we define the finite joint distribution as
We say that two stochastic processes have the same distribution if their finite joint distributions are equals, denoted as \(\sim\). This equality can be defined as follow.
Definition 1
(see Micheas 2018) Let \(\{X(t,\omega ):t \ge 0,\omega \in \Omega \}\) and \(\{Y(t,\omega ):t\ge 0,\omega \in \Omega \}\) be two stochastic processes on the same probability space \((\Omega , \mathcal {A},P)\). The processes are said to be equal in distribution if for each \(w \in \Omega\), \(0 \le t_1,t_2,\ldots ,t_n <\infty\) and \(B \in \mathcal {B}(\mathbb {R}^n)\) it holds:
Other definitions needed for this work are related to selfsimilar processes and their increments; thus, we need to describe the main properties of these processes to formalize the new method to estimate the Hurst exponent.
Definition 2
(see Lamperti 1962) A stochastic process \(\{X(t,\omega ):t\ge 0,\omega \in \Omega \}\) is called selfsimilar if a parameter, called the Hurst exponent or selfsimilarity index, exists and is denoted by H, which satisfies the relation:
for each \(\tau >0\) and \(t\ge 0\).
Interesting properties of these processes are the stationarity and selfaffinity of the increments. We examine the processes with increments that satisfy these conditions.
Definition 3
(see Micheas 2018) Let \(\{X(t,\omega ):t\ge 0,w \in \Omega \}\) be a stochastic process. The increments are stationary if they verify that
for all \(t>0\) and all \(\tau \ge 0\).
Definition 4
(see Mandelbrot 2002) Let \(\{X(t,\omega ):t\ge 0,w \in \Omega \}\) be a stochastic process. The increments of this process are said to be selfaffine of parameter H if they satisfy that
for each \(t>0\), \(c>0\) and \(\tau \ge 0\).
A selfsimilar process with stationary increments has selfaffine increments (see Trinidad Segovia et al. 2012).
Methods for the estimation of the Hurst exponent with an equality in distribution
In this section, we review methods for estimating the Hurst exponent. An important fact about these methods is that equality in distribution is verified (which is used later for our purpose).
Generalized Hurst exponent
The GHE(q) is an accurate method for estimating the Hurst exponent introduced by Barabási and Vicsek (1991) (see also Di Matteo et al. 2003). The GHE(q) is based on the scaling behavior of the absolute moment of order q of the process increments. The operation of the method can be understood with the following expression (see Barunik and Kristoufek 2010):
with \(c>0\). In the procedure, we take a time series \(\{X(1),X(2),\ldots ,X(T)\}\) and the \(k_q\) statistic is calculated as follows:
where T is the length of the series and \(\tau\) is the length of the subperiods.
Literature (GómezÁguila and SánchezGranero 2021) provided an alternative mathematical justification of the GHE method for selfsimilar processes with stationary increments. Let \(\{X(t,\omega ):t> 0,w \in \Omega \}\) be a selfsimilar process with stationary increments. For selfsimilar processes with stationary increments, the following equality in distribution holds that
with \(t>0\) and \(\tau >0\). To obtain the GHE(1) algorithm, we can take take expected values from Eq. (2). Note that for the GHE(q) algorithm, we take only the moment of order q in Eq. (2),
and hence, Eq. (1) holds with \(c=E[X(1,\omega )X(0,\omega )^q]\). To estimate the Hurst exponent of a time series, we take logarithms in Eq. (3), obtaining a linear regression
where the Hurst exponent estimation is obtained as the slope of the linear regression. Note that the expected values are calculated as the sample mean.
With equality in distribution for \(q = 1\), we then use the GHE(1) in our new method.
Triangle area method
The TA method (GómezÁguila and SánchezGranero 2021) is an improvement of the TTA method introduced by Lotfalinezhad and Maleki (2020) (see also Berzin and León 2008). TA is based on the area of the triangles formed in intervals of length \(2\tau\), with the first, middle and last points of the interval (note that the area of overlapping triangles is also calculated). Let \(\{X(t,\omega ):t> 0,w \in \Omega \}\) be a selfsimilar process with stationary increments. The Hurst exponent can be obtained from
where \(AT(\tau ,\omega )\) is the distribution of the triangle area in an interval of length \(2\tau\). This distribution can be obtained as
Thus, it follows that \(AT(1,\omega )=\frac{1}{2}X(t+2,\omega )X(t+1,\omega )(X(t+1,\omega )X(t,\omega ))\). Indeed, GómezÁguila and SánchezGranero (2021) proved that a selfsimilar process with stationary increments satisfies the next expression
and thus, we have the needed equality in distribution.
We take expected values and logarithms in Eq. (5) and the estimation of the Hurst exponent is obtained as the slope of the linear regression
FD approach
FernándezMartínez et al. (2014) introduced the FD algorithms, which are a generalization of the GM2 approach (see Trinidad Segovia et al. 2012). These methods were tested for selfsimilar processes with stationary increments and their accuracy was examined for fractional Brownian motions and Lévy stable motions (FernándezMartínez et al. 2014).
Let \(\{X(t,\omega ):t>0,w \in \Omega \}\) be a selfsimilar process with stationary increments. The following power law holds (see Corollary 3.6 in Mandelbrot 2002):
where \(M(\tau )=M_0(\tau )\) and \(M_t(\tau )\) is defined as
Note that \(M_t(\tau )\) is the difference between the maximum and the minimum of the series in an interval of length \(\tau\). If we take the expected value in Eq. (7) (\(q=1\)), then we have the method GM2 (Trinidad Segovia et al. 2012)
where E[M(1)] is constant. Moreover, if we use Eq. (7) with the moment of order q, we obtain the methods FD(q)
with \(q>0\). The FD4 approach is defined when we use Eq. (8) for a very low q (usually, \(q=0.01\)). The FD4 algorithm has certain advantages for financial series, such as its accuracy in estimating the Hurst exponent of short series (FernándezMartínez et al. 2014). We use the FD4 method for our new method and for the simulations.
To estimate the Hurst exponent with the FD4 algorithm, we take logarithms of the expected value in Eq. (8) and the slope of the following linear regression give us the Hurst exponent:
The KS method
To estimate the Hurst exponent, we introduce a new procedure (a meta method), that is, the KS method, to estimate the Hurst exponent. In most cases, existing methods use the scaling behavior (a power law) of certain elements of the process, and with a linear regression of the logarithm, the Hurst exponent is obtained. In methods described previously, we take expected values of the equality in distribution to estimate the Hurst exponent. However, equality in distribution is a stronger concept than that in expected values.
The new approach uses the mentioned idea of equality in distribution, on which basis various methods can be used. In this study, we illustrate this procedure with the methods stated in “Methods for the estimation of the Hurst exponent with an equality in distribution” section. In this section, we use the KS statistic, that is, the KS distance between the empirical distributions of two samples.
We then propose to use the equality in distribution, rather than the equality of several absolute moments of order q of the distribution, which is a stronger property. Next, we illustrate this approach for the GHE(1) method because the procedure is analogous for other methods.
Let \(\{X(t,\omega ):t>0,w \in \Omega \}\) be a selfsimilar process with stationary increments; then, from the relationship of GHE(1) in “Methods for the estimation of the Hurst exponent with an equality in distribution” section, we obtain:
Note that in the GHE(1) method, expected values are taken in the previous expression. However, we take the distribution function of the distributions \(X(t+\tau ,\omega )X(t,\omega )\) and \(X(1,\omega )X(0,\omega )\), denoted as \(F_{\tau }\) and \(F_{1}\) respectively. From the last expression, it follows that:
with \(x>0\), where the selfsimilarity of the process has been used on the second equality. Now, we define the next function with the Hurst exponent as a parameter:
where \(\tau\) is the length of intervals that we select and D is the KS distance (see Hoboes 1958). The Hurst exponent can be obtained as the parameter H that minimizes the function g. The minimum value for g indicates that the distribution \(F_\tau\) is the most similar to the distribution \(F_1\). Given that for \(\tau =1\), the distribution with the largest sample, we assume that it is the one with the most information and is therefore used for comparison with all the others.
In practice, we estimate the distribution function by the empirical distributions for the \(\tau\)s selected and minimize the function g. To take the samples we use the equivalence
Figure 1 shows the empirical cumulative distribution function of samples of the GHE(1) method for different \(\tau\)s of three different fractional Brownian motions. We can observe that the distributions are similar.
Figure 2 shows the examples of the minimization of function g for the calculation of the KS method, where the minimum of function g is calculated for a fractional Brownian motion, with \(H=0{.}1,0{.}5,0{.}9\) and size \(2^{10}\). The results of the approach are 0.104, 0.498, and 0.903, which are close to the expected values.
To exemplify another of the KS methods, we show in Fig. 3 the calculation of the minimum of the g function of the KSTA algorithm for fractional Brownian motions, with \(H=0{.}1,0{.}5,0{.}9\) and size \(2^{10}\). We obtain values of 0.113, 0.508 and 0.918 respectively, which are quite close to the expected ones.
To conclude this section, we present the algorithm used to obtain the Hurst exponent of a time series with the KS method. We show the algorithm applied to the GHE(1) method, being analogous to the rest that imply an equality in distribution. The algorithm has the following steps.

(1)
We define the scale range of \(\tau\) for the algorithm that is usual to take as \(\{2^n: n \in \{ p_{min}, \ldots , p_{max}\}\}\), where \(p_{min}\) and \(p_{max}\) are the minimum and the maximum power, respectively (Barunik and Kristoufek 2010). In our case, we take the scaling range as \(\{2^n: n \in \{0,1,\ldots ,log_2(length(serie)/8) \} \}\).

(2)
For each \(\tau\), we calculate the value of the samples taken from the series corresponding to the method (implying equality in distribution) that we are going to use. In our case, we estimate \(\frac{X(t+\tau ,\omega )X(t,\omega )}{\tau ^H }\) (expression 11) for each \(\tau\) selected.

(3)
We construct the empirical cumulative distribution function for each \(\tau\). Note that the distance will depend on the parameter H, which is the one we want to estimate.

(4)
We calculate the KS distance between the empirical cumulative distribution functions for \(\tau\) and \(\tau =1\).

(5)
For each \(\tau\), we add the distances, obtaining the function g(H) (expression 10).

(6)
We calculate the minimum of function g(H). The value of H that minimizes this function will be our estimate of the Hurst exponent.
The algorithm of the GHE method is given in the literature (Aste 2022). As can be seen, the main difference between the algorithms is that in our case, we use equality in distribution rather than that of expectations to obtain H.
Below, we show the code that we have used to implement the KSGHE(1) method in python.
Testing the algorithms
We test the introduced procedure in this section by comparing the accuracy to calculate the Hurst exponent on fractional Brownian motions of the GHE(1), TA, FD4, and the KS version for each of them. Monte Carlo simulations are implemented to test the algorithms. We generate 1000 sample paths of selfsimilar processes with stationary increments of a fixed length. For a given \(0<H<1\), we considerer a process with such a Hurst exponent H. For these simulations, we use fractional Brownian motions. Tables 1, 2, and 3 show the mean and standard deviation of the simulations for the different methods.
Note that if two methods show a mean value close to the theoretical value of H, then the method with the lower standard deviation is better because all the estimations are closer to the theoretical value, indicating higher accuracy.
Tables 1 and 2 show how the results for small lengths are similar. In both cases, (GHE(1) and TA), the classical methods are slightly more accurate for low H. However, for higher H, the results of the KSGHE(1) and KSTA methods are slightly better. If we increase the size of the series, we can observe how the new methods are more accurate than the original GHE(1) and TA methods. For series of length \(2^{15}\), we can see how the mean is equal or slightly better. However, in almost all cases, the standard deviation of KSGHE(1) and KSTA methods is half that of GHE(1) and TA methods, respectively.
Table 3 shows the results for methods FD4 and KSFD4. In this case, we use the maximum and the minimum of each period of the series. Therefore, to simulate the fractional Brownian motions of a given length l, a fractional Brownian motion of length \(l \times 128\) is generated, taking the maximum and the minimum values of each subperiod of length 128, thus arriving at a fractional Brownian motion of length l. This methodology was used and can be reviewed (FernándezMartínez et al. 2014). The results of the methods for smaller Hs are somewhat biased because of the previously described fractional Brownian motion generation method, because of the necessity for a subinterval length greater than 128, as discussed in FernándezMartínez et al. (2014). However, the bias affects both algorithms, not due to the method itself but to the generation. The simulation results show that the classical method is slightly more accurate on average, but the KSFD4 has a smaller standard deviation. For larger series and higher H, the KS method becomes better than the FD4 method.
To strengthen the analysis, we perform a series of statistical tests to show evidence to draw the conclusions discussed in previous paragraphs. Given that the results for the means of the algorithms are similar in almost all cases, we analyze whether the standard deviations have a statistical difference. If so, then the standard deviations are different and one is less than the other. We can assert that the results of the method with the smallest standard deviation are more accurate. For each pair of methods compared, we perform the Levene test (Levene 1960). If the pvalue obtained is less than 0.01, we can accept that the standard deviations of the methods have statistically significant differences at the 99% confidence level. Tables 4, 5, and 6 show the results. The Levene tests corroborate the conclusions obtained, showing significant and nonsignificant differences between the standard deviations, obtaining similar results to those mentioned previously. Note that especially for large sizes, the equality of the standard deviations is rejected. In addition, given that the standard deviations of KS method are smaller, we can state that the results of this method are significantly more accurate.
As can be seen, the greater the length of the series, the better the empirical cumulative distribution functions are estimated. Hence, the accuracy of the methods increases.
Application of the algorithm to stock prices in the S &P500 index
This section is devoted to the study of the application of KS methods to stock price series. To do this, we study the stocks in the S &P500 index and determine if the stock prices can be considered a selfsimilar series. Note that the prices used are close prices adjusted by splits and dividends. In literature, certain checks are sometimes performed to see the selfsimilarity of the series. However, in several cases, no check is performed and the hypothesis that the stock price series are selfsimilar is directly accepted. We propose using the KS algorithm to study the selfsimilarity of the series.
A technique used to check the selfsimilarity of the series is that the coefficient of determination of the linear regression of the method used to estimate H is close to one. When we estimate the Hurst exponent with a linear regression (4, 6, or 9), we have to obtain a coefficient of determination close to 1. Thus, we can accept that the series has a selfsimilar behavior. However, this procedure does not provide us with information about how close the result must be to 1 to accept the selfsimilarity of the series. In addition, in this way, we usually check that the means verify selfsimilarity, but we do not obtain information on the selfsimilarity of the whole series.
To solve this problem, we use the KS method proposed in “The KS method” section. Specifically, we use KSTA but any KS method can be used. We take the stock prices of the S &P500 index and estimate the Hurst exponent with the TA algorithm. Note that in the following, we always work with the price logarithm instead of the price itself such that we can study the results of the KS tests. Our objective is to see if the relationship of the TA method (5) holds. For each \(\tau\) selected, we do a KS test between the samples of \(AT(1,\omega )\) and \(\frac{AT(\tau ,\omega )}{\tau ^{H+1}}\), using H as the Hurst exponent estimated with the TA method. The confidence level used is 99%. Thus, if the pvalue obtained with the test is less than 0.01, we reject the null hypothesis (the samples come from the same distribution). We accept that the series is selfsimilar for the estimated H if the pvalue is greater than 0.01 for all the selected \(\tau \ne 1\) (equality in distribution of the method is satisfied). In this case, if we perform a total of n tests at a confidence level of 99%, we accept that the series is selfsimilar with a confidence level of \((0.99^n \cdot 100)\)% if the equality of the distributions is accepted in all cases, and reject it otherwise. Thus, we obtain a method that, with a certain level of confidence, determines whether we can accept that the series of interest is selfsimilar or not. An important factor in this matter is that once H has been estimated in the samples taken to carry out the KS test, we must take these samples without overlapping. It means that the samples are disjointed. For example, for the TA method, if we take the triangles with vertices at t, \(t+\tau\) and \(t+2\tau\), then the next triangle has vertices at \(t+2\tau\), \(t+3\tau\) and \(t+4\tau\).
We analyze the stocks in the S &P500 index with the proposed method to check the selfsimilarity of the series. Table 7 shows that we can see the percentage of price series (that refer to the logarithm of the price) of the stocks of the index that satisfy the selfsimilar relationship (SelfSimilar (%)), number of days used (Days), number of price series evaluated (N Price Series), and the final confidence level (Confidence Level (%)) to estimate the Hurst exponent with the TA method.
Please note that a year equals approximately 256 business days and the series used have the duration indicated and end in September 2021.
Table 7 shows that a high percentage of the stocks in the S &P500 index satisfy the TA method relationship in all cases. Therefore, we can accept that most of the series are selfsimilar with a high confidence level. If we use fewer trading days, then we have a higher confidence level (obtained as \((0.99^n \cdot 100)\)%, where n is the number of \(\tau\)s used). The reason is that we will use fewer \(\tau\)s and thus require fewer KS tests. However, the selfsimilarity is not verified in a few cases. With the help of the KS method, we can check the selfsimilarity of each stock price to determine if the estimated H is incorrect with a high level of confidence.
If we analyze the series that do not verify selfsimilarity, we can observe some interesting data. On the one hand, we see for which particular \(\tau\)s the selfsimilarity of the series fails, that is, for what \(\tau\) the equality in distribution between \(AT(1,\omega )\) and \(\frac{AT(\tau ,\omega )}{\tau ^{H+1}}\) in each series is not verified.
Table 8 shows how often each \(\tau\) fails the KS test as a function of the length of the series. Note that in a series, the KS test can fail for several \(\tau\)s, and thus, the total number of series used (for a fixed length) is not equal to the sum of failures for each \(\tau\). The failures of the KS test occur more frequently for \(\tau\)s of intermediate length.
On the other hand, we analyze the influence of the estimated H used for the KS tests, to see if for given values of H there are more failures in the tests. In particular, we distinguish series with H greater (persistence) or less (antipersistence) than 0.5. In this way, we can check if for these series, the failure rates remain the same or are higher for other values of H. Table 9 shows the number of series for which any KS test fails according to the estimated Hurst exponent. We can observe that for all durations (except for 128 days), the percentage of series with H less than 0.5 that fail the KS test is higher than the percentage of series with H greater than 0.5. If we group the series with different lengths, we obtain that 8.78% of the series with H less than 0.5 can fail to verify the selfsimilarity relationship, whereas this percentage would be 3.14% for the series with H greater than 0.5
The above procedure can be used for all KS methods described in this study. The only difference is the required equality in distribution relationship of the corresponding method. Moreover, this procedure can be combined with other traditional methods that use a linear regression as a consequence of equality in distribution, such as those described in “Methods for the estimation of the Hurst exponent with an equality in distribution” section.
In fact, we can construct a confidence interval for the Hurst exponent. This interval can be formed by all the H between 0 and 1 for which the KS test accepts that the samples come from the same distribution for all \(\tau\)s. If the estimated H is between the limits of the interval, then at a certain confidence level, the series is selfsimilar. Therefore, if we estimate H and it is not in this confidence interval, then we can say that our estimation has not been good at this confidence level because the series is not selfsimilar.
We show the confidence intervals for several selected examples of stocks in the S &P500 index. In the procedure, we estimate the Hurst exponent of the series (with a length of 1024 trading days, that is, about 4 years) with the TA method. Then, we compute the confidence interval at which the estimated H makes the series selfsimilar. If our estimation of H is outside of the interval, we cannot be sure that the series is selfsimilar for the estimated H at the confidence level. Thus, the obtained value of H is not reliable.
Table 10 shows the results of several examples of the confidence intervals described above. On the one hand, for CHD, CCL, and EQIX, we can say that for the estimated H, the stocks prices do not satisfy the selfsimilarity, because the calculated H is outside the confidence interval. This result can lead to inappropriate interpretations of the H exponent. For OKE, APTV, and NRG, the estimated H with the TA method is within the confidence interval. However, the confidence intervals in all cases are too large. Thus, we can say that for the estimated H, the selfsimilarity is verified, but we do not obtain more information about the estimation.
Now, we analyze an example in which the estimated H is not in the confidence interval to discuss what might happen. We take the stock prices of CHD. The H estimated with the TA algorithm is 0.498. Figure 4 shows an observable linear regression of the method, where we obtain a determination coefficient of 0.999. Thus, there is nothing unusual.
We now study how the confidence interval has been constructed. Figure 5 shows that for each \(\tau\), the pvalues of the Kolmogorov–Smirnov test for each H between 0 and 1. The figure also shows a vertical line with the estimated Hurst exponent (H), a horizontal line with the pvalue 0.01, and the \(94.15\%\) confidence interval (C.I.). The results show that the estimated H verifies selfsimilarity for all \(\tau\)s except for \(\tau =64\), for which it is rejected that the data are derived from the same distribution.
To study the possible reasons of the results, we analyze the distributions for each \(\tau\) (see Fig. 6). We observe the distribution for \(\tau =64\) differs from the rest of \(\tau\)s, in particular from \(\tau =1\). Thus, as expected, selfsimilarity has been rejected for this value of H.
With this example, we illustrate why selfsimilarity is rejected for the estimated H and show how the confidence interval for H is generated.
Conclusions
This study adds to previous financial literature (SánchezGranero et al. 2008; Couillard and Davison 2005; Dimitrova et al. 2019, or Sánchez et al. 2015) that aims to solve the deficiencies in the estimation and interpretation of the Hurst exponent.
The contributions of this study are twofold. First, we introduce a new procedure to improve the estimation of the Hurst exponent in financial time series. This procedure can be applied to any estimation method based on equality in distribution, as illustrated with three of the most popular methods: GHE(1), TA, and FD4. We test the new approaches that are called KSGHE(1), KSTA, and KSFD4 and then compare them with the original approaches, proving their higher accuracy, mainly for long series where the standard deviations of the new methods are approximately half of the original ones.
Another important advantage of this new approach is its basis on equality in distribution, whereas the original approaches use equality in some absolute moment, which is a weaker condition. Consequently, the results obtained are more reliable and robust. The second contribution is that although financial literature accepts that logarithmic price series are selfsimilar, which is a necessary condition for all Hurst exponent estimation algorithms, this study provides evidence that pure price series are not necessarily selfsimilar. Based on this new approach, we offer a method to verify the selfsimilarity condition.
Consequently, the KS approach introduced turns out to be highly accurate and robust for estimating the Hurst exponent. Thus, its implementation under the described conditions can allow portfolio managers, investors, and policymakers to ensure that market decisions are not based on inadequate interpretations of market efficiency.
Availability of data and materials
Data is available from Yahoo Finance.
References
Alessio E, Carbone A, Castelli G, Frappietro V (2002) Secondorder moving average and scaling of stochastic time series. Eur Phys J BCondens Matter Complex Syst 27(2):197–200
AlvarezRamirez J, Echeverria JC, Rodriguez E (2008) Performance of a highdimensional R/S method for Hurst exponent estimation. Physica A 387(26):6452–6462
Aste T (2022) Generalized Hurst exponent. https://www.mathworks.com/matlabcentral/fileexchange/30076generalizedhurstexponent. MATLAB Central File Exchange. Retrieved 7 April 2022
Backus D, Zin E (1993) Long memory inflation uncertainty: evidence from the term structure of interest rates. J Money Credit Bank 25(3):681–700
Bae K, Kang H, Kang J (2020) Can fattail create the momentum and reversal? Appl Econ 52(44):4850–4863
Baillie RT, Chung C, Tieslau MA (1995) Analyzing inflation by fractional integrated ARFIMAGARCH model. J Appl Econom 11:23–40
Balladares K, RamosRequena JP, TrinidadSegovia JE, SánchezGranero MA (2021) Statistical arbitrage in emerging markets: a global test of efficiency. Mathematics 9(2):179
Barabási AL, Vicsek T (1991) Multifractality of selfaffine fractals. Phys Rev A 44(4):2730
Bariviera AF, Guercio MB, Martinez LB (2012) A comparative analysis of the informational efficiency of the fixed income market in seven European countries. Econ Lett 116(3):426–428
Barunik J, Kristoufek L (2010) On Hurst exponent estimation under heavy tailed distributions. Physica A 389(18):3844–3855
Beben M, Orłowski A (2001) Correlations in financial time series: established versus emerging markets. Eur Phys J B 20(4):527–530
Bensaida A (2014) Noisy chaos in intraday financial data: evidence from the American index. Appl Math Comput 226:258–265
Berzin C, León JR (2008) Estimations in models driven by fractional Brownian motion. Annales de l’IHP Probabilités et statistiques 44:191–213
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31:307–327
Cajueiro D, Tabak B (2005) Ranking efficiency for emerging equity markets II. Chaos Solitons Fractals 23:671–675
Carbone A, Castelli G, Stanley HE (2004) Timedependent Hurst exponent in financial time series. Physica A 344:267–271
Ciner C (2021) Stock return predictability in the time of COVID19. Financ Res Lett 38:101705
Couillard M, Davison M (2005) A comment on measuring the Hurst exponent of financial time series. Physica A 348:404–418
Das A, Das P (2006) Does composite index of NYSE represents chaos in the long time scale? Appl Math Comput 174(1):483–489
Di Matteo T, Aste T, Dacorogna M (2003) Scaling behaviors in differently developed markets. Physica A 324:183–188
Di Matteo T, Aste T, Dacorogna M (2005) Longterm memories of developed and emerging markets: using the scaling analysis to characterize their stage of development. J Bank Financ 29(4):827–851
Diebold FX, Rudebusch GD (1989) Long memory and persistence in an aggregate output. J Monet Econ 24:189–209
Dimitrova V, FernándezMartínez M, SánchezGranero MA, TrinidadSegovia JE (2019) Some comments on Bitcoin market (in)efficiency. PLoS ONE 14:e0219243
Engle RF (1982) Autoregressive conditional heteroskedaticity with estimates of the variance of United Kingdom inflation. Econometrica 50:987–1008
Fama E (1970) Efficient capital markets: a review of theory and empirical work. J Financ 25(2):383–417
FernándezMartínez M, SánchezGranero MA, Trinidad Segovia JE (2013) Measuring the selfsimilarity exponent in Lévy stable processes of financial time series. Physica A 392(21):5330–5345
FernándezMartínez M, SánchezGranero MA, Trinidad Segovia JE, RománSánchez IM (2014) An accurate algorithm to calculate the Hurst exponent of selfsimilar processes. Phys Lett A 378(32–33):2355–2362
Geweke J, PorterHudak S (1983) The estimation and application of long memory time series models. J Time Ser Anal 4(4):221–238
GómezÁguila A, SánchezGranero MA (2021) A theoretical framework for the TTA algorithm. Physica A 582:126288
Haslett J, Raftery AE (1989) Spacetime modelling with longmemory dependence: assessing Ireland’s wind power resource. J R Stat Soc: Ser C (Appl Stat) 38(1):1–21
Hassler U, Wolters J (1995) Long memory in inflation rates: International evidence. J Bus Econ Stat 13:37–45
Higuchi T (1988) Approach to an irregular time series on the basis of the fractal theory. Physica D: Nonlinear Phenom 31:277–283
Hoboes JL Jr (1958) The significance probability of the Smirnov twosample test. Arkiv fiur Matematik 3(43):469–486
Hurst H (1951) Long term storage capacity of reservoirs. Trans Am Soc Eng 116:770–799
Kantelhardt J, Zschiegner S, KoscielnyBunde E, Bunde A, Havlin S, Stanley E (2002) Multifractal detrended fluctuation analysis of nonstationary time series. Physica A 316:1–4
Kou G, Chao X, Peng Y, Alsaadi FE, HerreraViedma E (2019) Machine learning methods for systemic risk analysis in financial sectors. Technol Econ Dev Econ 25(5):716–742
Kou G, Yüksel S, Dinçer H (2022) Inventive problemsolving map of innovative carbon emission strategies for solar energybased transportation investment projects. Appl Energy 311:118680
Kristoufek L (2019) Are the crude oil markets really becoming more efficient over time? Some new evidence. Energy Econ 82:253–263
Kristoufek L, Vosvrda M (2014) Measuring capital market efficiency: longterm memory, fractal dimension and approximate entropy. Eur Phys J B 87:162
Kristoufek L, Vosvrda M (2019) Cryptocurrencies market efficiency ranking: not so straightforward. Physica A 531:120853
Lamperti JW (1962) Semistable stochastic processes. Trans Am Math Soc 104(1):62–78
Levene H (1960) Robust test for equality of variances. In: Olkin I, Ghurye SG, Hoeffding W, Madow WG, Mann HB (eds) Contributions to probability and statistics: Essays in honor of Harold Hotelling. Stanford University Press, Stanford, CA, pp 278–292
Li T, Kou G, Peng Y, Philip SY (2021) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3109066
Lotfalinezhad H, Maleki A (2020) TTA, a new approach to estimate Hurst exponent with less estimation error and computational time. Physica A 553:124093
Luo Y, Huang Y (2018) A new combined approach on Hurst exponent estimate and its applications in realized volatility. Physica A 492:1364–1372
Mandelbrot B (1963) The variation of certain speculative prices. J Bus 36(4):394–419
Mandelbrot BB (2002) Gaussian selfaffinity and fractals, 1st edn. SpringerVerlag, New York
Mantegna RN, Stanley HE (1996) Scaling behaviour in the dynamics of an economic index. Nature 376:46–49
Mantegna RN, Stanley HE (2000) An introduction to econophysics: correlations and complexity in finance. Cambridge University Press, Cambridge
Matos JAO, Gama SMA, Ruskin HJ, Al Sharkasi A, Crane M (2008) Time and scale Hurst exponent analysis for financial markets. Physica A 387(15):3910–3915
Mercik S, Weron K, Burnecki K, Weron A (2003) Enigma of selfsimilarity of fractional Lévy stable motions. Acta Phys Pol B 34(7):3773–3791
Micheas AC (2018) Theory of stochastic objects: probability, stochastic processes and inference, 1st edn. CRC Press, Boca Raton
Peng C, Buldyrev S, Havlin S, Simons M, Stanley H, Goldberger A (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49(2):1685
Robinson PM (1995) Gaussian semiparametric estimation of long range dependence. Ann Stat 23:1630–1661
Sánchez MA, Trinidad JE, García J, Fernández M (2015) The effect of the underlying distribution in Hurst exponent estimation. PLoS ONE 10(5):e0127824
SánchezGranero MA, TrinidadSegovia JE, GarcíaPérez J (2008) Some comments on Hurst exponent and the long memory processes on capital markets. Physica A 387:5543–5551
SánchezGranero MA, Balladares KA, RamosRequena JP, TrinidadSegovia JE (2020) Testing the efficient market hypothesis in Latin American stock markets. Physica A 540:123082
Shahzad SJH, Hernandez JA, Hanif W, Kayani GM (2018) Intraday return inefficiency and long memory in the volatilities of forex markets and the role of trading volume. Phys A 506:433–450
Taqqu S, Teverovsky V, Willinger W (1995) Estimators for longrange dependence: an empirical study. Fractals 3(4):785–798
Tiwari AK, Umar Z, Alqahtani F (2021) Existence of long memory in crude oil and petroleum products: generalised Hurst exponent approach. Res Int Bus Financ 57:101403
Trinidad Segovia JE, FernándezMartínez M, SánchezGranero MA (2012) A note on geometric methodbased procedures to calculate the Hurst exponent. Physica A 391(6):2209–2214
Veitch D, Abry P (1999) A waveletbased joint estimator of the parameters of longrange dependence. IEEE Trans Inf Theory 45(3):878–897
Wan Z, Li H, Luo Y, Huang Y (2022) A novel Bayesian approach to estimate long memory parameter. J Stat Comput Simul 92(5):1078–1091
Weron R (2002) Estimating longrange dependence: finite sample properties and confidence intervals. Physica A 312(1–2):285–299
Willinger W, Taqqu MS, Teverovsky V (1999) Stock market prices and longrange dependence. Finance Stoch 3(1):1–13
Zhaoa P, Panb J, Yuea Q, Zhanga J (2021) Pricing of financial derivatives based on the Tsallis statistical theory. Chaos Solitons Fractals 142:110463
Zunino L, Tabak B, Perez D, Garavaglia M, Rosso O (2007) Inefficiency in LatinAmerican market indices. Eur Phys J B 60:111–121
Funding
This work is supported by grants PGC2018101555BI00 (Ministerio Español de Ciencia, Innovación y Universidades and FEDER), PID2021127836NBI00 (Ministerio Español de Ciencia e Innovación and FEDER) and UAL18FQMB038A (UAL/CECEU/FEDER). MASG also acknowledges the support of CDTIME.
Author information
Authors and Affiliations
Contributions
All authors contributed equally. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
GómezÁguila, A., TrinidadSegovia, J.E. & SánchezGranero, M.A. Improvement in Hurst exponent estimation and its application to financial markets. Financ Innov 8, 86 (2022). https://doi.org/10.1186/s4085402200394x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4085402200394x
Keywords
 Hurst exponent
 Long memory
 Financial market
 TA algorithm
 GHE algorithm
 FD algortihms