The conventional cointegration tests of Engel and Granger (1987) and Johansen (1991) implicitly assume linearity and time invariance. The inferences drawn from these tests may be misleading if the relationships are nonlinear and time-varying. Following McMillian (2005), this study uses a rolling cointegration test for stock prices and other macroeconomic variables. In the rolling framework, we fix the size of a rolling sample window,Footnote 1 and move the sample window by adding one observation to the end and removing the first one. For each rolling sample window, we conduct conventional augmented Dickey-Fuller (ADF) unit root tests to determine the order of integration for each of the variables used in every rolling window. If the test ensures all variables are integrated of order one in all rolling windows, then we can move to implement the conventional Johansen (1991) cointegration test and the trace statistics can be observed and scaled by their 5% critical values. The null hypothesis of no cointegration can be rejected at a 5% level for the specified sub-sample period if the value of the scaled test statistic is greater than one.
The ACE algorithm and nonlinear cointegration
The ACE algorithm
Granger and Hallman (1991) and Meese and Rose (1991) analyzed the non-parametric ACE algorithmFootnote 2 developed by Breiman and Friedman (1985) for raw variables to obtain nonlinear transformations of their respective variables and used a causality test for the transformed variables to infer a nonlinear causal relationship. The ACE method converts the original variables into transformed variables, ensuring that there is maximum correlation between the variables with the highest R-squared; ACE procedures adopt extremely weak distributional assumptions and can handle a wide variety of nonlinear transformations of the data by utilizing flexible data smoothing techniques. Thus, any test on these transformed variables can be viewed as evidence for nonlinearity.
Assume a linear regression model contains k independent variables, namely x1t, x2t, …., and xkt, and a dependent variable yt:
$$ {\boldsymbol{y}}_{\boldsymbol{t}}={\boldsymbol{\delta}}_{\mathbf{0}}\sum \limits_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{k}}{\boldsymbol{\delta}}_{\boldsymbol{i}}{\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}+{\boldsymbol{\varepsilon}}_{\boldsymbol{t}} $$
(1)
Where, δ0, δ (i = 1,2,…,k) are the regression coefficients to be estimated, and εt is an error term. Thus eq. (1) assumes that yt, the dependent variable, is a linear function of k independent variables. An ACE algorithm using the regression model in eq. (1) can be written as.
$$ \boldsymbol{f}\left({\boldsymbol{y}}_{\boldsymbol{t}}\right)=\sum \limits_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{k}}{\boldsymbol{n}}_{\boldsymbol{i}}\left({\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\right)+{\boldsymbol{u}}_{\boldsymbol{t}} $$
(2)
Where f is a function of the dependent variable y, and ni is a function of the independent variables xi (i = 1,2,…,k). Equation (2) shows f (·), n1 (·), n2 (·) …, and nk (·) is the optimal transformation to be estimated. The ACE regression normalizes the coefficients to unity.
The ACE algorithm initially starts by defining arbitrary determinate mean zero transformations, f (yt), n1(x1t), n2(x2t),. ., and nk (xkt). To obtain the optimal transformation, the estimated model ensuring the maximum correlation among the variables and highest R-square from a regression as specified in eq. (2). Under the constraint of E [f (yt)]2 = 1, this is equivalent to minimizing the expected mean squared error of the regression. The expected mean squared error is given by
$$ {\boldsymbol{u}}^{\mathbf{2}}\left(\boldsymbol{f},\boldsymbol{n}\mathbf{1},\boldsymbol{n}\mathbf{2},\dots ..,\boldsymbol{n}\boldsymbol{k}\right)=\boldsymbol{E}{\left[\boldsymbol{f}\left({\boldsymbol{y}}_{\boldsymbol{t}}\right)-\sum \limits_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{k}}{\boldsymbol{n}}_{\boldsymbol{i}}\left({\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\right)\right]}^{\mathbf{2}} $$
(3)
Minimization of u2 with respect to ni (xi) (i = 1,2,…,k) and f(yt) is carried out through a series of single-function minimizations, resulting in the following equations:
$$ {\boldsymbol{n}}_{\boldsymbol{i}}\left({\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\right)=\boldsymbol{E}\left[\right(\boldsymbol{f}\left({\boldsymbol{y}}_{\boldsymbol{t}}\right)-\left(\sum \limits_{\boldsymbol{j}\ne \mathbf{1}}^{\boldsymbol{k}}\right({\boldsymbol{n}}_{\boldsymbol{j}}\left({\boldsymbol{x}}_{\boldsymbol{j}\boldsymbol{t}}\Big)\right)/{\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\Big] $$
(4)
$$ f\left({\boldsymbol{y}}_{\boldsymbol{t}}\right)=\frac{\boldsymbol{E}\left[\sum \limits_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{k}}\left({\boldsymbol{n}}_{\boldsymbol{i}}\right({\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\right)/{\boldsymbol{y}}_{\boldsymbol{t}}}{\left\Vert \left[\sum \limits_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{k}}\right({\boldsymbol{n}}_{\boldsymbol{i}}\left({\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{t}}\right)/{\boldsymbol{y}}_{\boldsymbol{t}}\Big]\right\Vert } $$
(5)
With \( \left\Vert \bullet \right\Vert \equiv {\left[E{\left(\bullet \right)}^2\right]}^{\frac{1}{2}} \)
The algorithm involves two basic mathematical operations: conditional expectations and iterative minimization; thus, the transformation is referred to as alternating conditional expectations. Using eq. (4) to transform all variables, one variable is fixed and the transformation of the variable in question is estimated using a nonparametric data smoothing technique; the algorithm then continues to the next variable. For each variable, the iterations continue until the mean squared error of eq. (3) has been minimized. Breiman and Friedman (1985) show that the ACE algorithm provides transformations such that f (yt), n1(x1t), n2(x2t), …., and nk (xkt) converge asymptotically to the true functional forms of the optimal transformations. The distinction of these transformations is that ACE does not treat the explanatory variables as fixed, but instead treats the variables as drawn from a joint distribution. After the estimation of ni (xit) (i = 1,2,…,k), f (yt) is estimated, conditioned on these estimates, according to eq. (5). Alternating between Eqs. (4, 5), the ACE method iterates until eq. (3) is minimized. The transformations of ni*(xi) (i = 1,2,…,k) and f *(y) that achieve minimization are the optimal transformations.
Nonlinear Cointegration
As argued in studies like Granger (1991), Granger and Hallman (1991), Meese and Rose (1991), Kanas (2003, 2005) and Tang and Zhou (2013), nonlinear cointegration can be characterized from the linear cointegration of the ACE transformed variables. There are two steps involved in this procedure. First, as highlighted by Granger and Hallman (1991), in converting all original variables into ACE transformed variables, we have to ensure that the transformed variables do not deviate from the time-series properties of the original variables. Second, the usual Johansen (1991) type cointegration test is implemented for the transformed variables. This nonlinear cointegration can also be used in a rolling framework to examine time-varying nonlinearity in the relationship. If the transformed variables deviate from the time series properties of the original variables for the entire sample period or any subsample period in a rolling window, we cannot implement the usual cointegration test to capture the time varying aspect of the relationship. In that case, we propose using the continuous wavelet transform methodology to understand the relationship at different frequencies over time.
Continuous wavelet transform
CWT is a better alternative model for understanding the time-varying and nonlinearity of the relationships between stock prices and other macroeconomic variables. Wavelet coherency is used to determine the coherency between two variables for different frequencies over time, and partial wavelet coherency is used to determine the coherency between two variables conditional upon other variables for different frequencies over time. Following Conraria and Soares (2011), the methodology of deriving the CWT is as follows. The set of square integrables and the space of finite energy functions is denoted by L2 (R) The minimum criteria to impose on a function ψ (t)∈ L2 (R) The wavelets begin with a mother (admissible or analyzing) wavelet, which consists of a technical portion of the admissibility condition.
$$ 0<{C}_{\varPsi }=\underset{-\infty }{\overset{\infty }{\int }}\frac{\left|\uppsi (w)\right|}{\left|w\right|} dw<\infty $$
(6)
Cψ denotes the constant of the admissibility or analyzing constant.
The purpose of the wavelet is to provide the time frequency of localization; the wavelet localized function gives both the time and frequency domains.
$$ \Psi (0)=\underset{-\infty }{\overset{\infty }{\int }}\psi \left(\mathrm{t}\right)\mathrm{dt}=0 $$
(7)
ψ denotes wiggles up and down in the time axis; the CWT starts with a mother wavelet ψ, and a family ψ τ,s of “wavelet daughters” can be obtained by simply scaling and translating ψ:
$$ {\psi}_{\tau, s}\left(\mathrm{t}\right)=\frac{1}{\sqrt{\left|s\right|}}\varPsi \left(\frac{t-\tau }{s}\right),s,\tau \in R,s\ne 0 $$
(8)
where s denotes a scaling or dilation factor that controls the width of the wavelet and τ denotes the translation parameter, which controls the location of the wavelet. Wavelet scaling indicates stretching if (|s| > 1) or (|s| < 1), whereas translating it means shifting its position in time. Given a time series, x (t) ∈ L2 (R), the CWT of wavelet ψ is a function of two variables, W x:ψ (τ, s).
$$ {W}_{x;\psi}\left(\tau, s\right)=\underset{-\infty }{\overset{\infty }{\int }}x(t)\frac{1}{\sqrt{\left|s\right|}}{\psi}^{\ast}\left(\frac{t-\tau }{s}\right) dt $$
(9)
The time domain wavelet is given by τ, while the frequency domain is given by s. The wavelet transforms the time and frequency domains by mapping the original series into a function of τ and s; the wavelet provides concurrent information on the time and frequency domains. The procedures of wavelet and Fourier transformations are somewhat similar. However, the Fourier transformation differs from the wavelet in that it has no time localization parameter. Moreover, we have two functions, cosine and sine, instead of a wavelet function. CWT may also be represented in the frequency as
$$ {W}_x\left(\tau, s\right)=\frac{\sqrt{\left|s\right|}}{2\pi}\underset{-\infty }{\overset{\infty }{\int }}{\uppsi}^{\ast }(sw)X(w){e}^{iwt} dw $$
(10)
To develop wavelet coherency, we need two more derivations: the cross-wavelet transform (XWT) and cross wavelet power (XWP). The cross wavelet transform of two times series, x(t) and y(t), was introduced by Hudgins et al. (1993) and defined as
$$ {W}_{xy}={W}_x{W}_y^{\ast } $$
(11)
where Wx and Wy represent the wavelet transforms of x and y, respectively. The cross-wavelet power provides a quantified indication of the similarity of power between two time series at each time and frequency and is defined as
$$ {(XWP)}_{xy}=\left|{W}_{xy}\right|\cdot $$
Complex wavelet coherency
The set of two time series x(t) and y(t) describes their complex wavelet coherency:
$$ {C}_{xy}=\frac{S\left({W}_{xy}\right)}{\sqrt{S\left({\left|{W}_x\right|}^2\right)S\left({\left|{W}_y\right|}^2\right)}} $$
(12)
The Wavelet Coherency is denoted as follows
$$ {R}_{xy}=\frac{\left|S\left({W}_{xy}\right)\right|}{{\left[S\left({\left|{W}_X\right|}^2\right)S\left({\left|{W}_y\right|}^2\right)\right]}^{1/2}} $$
(13)
$$ 0\le {R}_{xy}\left(\tau, s\right)\le 1 $$
Wx and Wy are the wavelet transforms of x and y, respectively, S denotes a smoothing operator in both time and scale; without smoothing, coherency would be identical across all scales and times. The partial wavelet coherency of x1 and xj (2 ≤ j ≤ p) are denoted as follows:
$$ {C}_{1j. qj}=-\frac{{\varphi_{j1}}^d}{{{\sqrt{\varphi}}_{11}}^d{\varphi_{jj}}^d} $$
(14)
r1jqjis measured as the absolute value
$$ {r}_{1j. qj}=\frac{\left|{\varphi_{ji}}^d\right|}{\sqrt{{\varphi_{11}}^d}{\varphi_{jj}}^d} $$
(15)
The Squared partial wavelet coherency of xi, j
$$ {{r_1}^2}_{j. qj}={\frac{\left|{\varphi_{ji}}^d\right|}{\sqrt{{\varphi_{11}}^d}{\varphi_{jj}}^d}}^2 $$
(16)
Where φ denote the p x p matrix of all the smoothed cross-wavelet spectra Sij.
Wavelet coherency is the relationship to the product of the spectrum of each series, which can be the local correlation between the time and frequency domains in the two time series. This concept definition nearly replicates the traditional correlation coefficient. Wavelet coherence provides localized correlation coefficients in time and frequency space. When no correlation measures zero coherencies between the two time series and wavelet coherency is equal to one, it shows a strong correlation between the time and frequency domains. The statistical significance of estimated wavelet coherency can be explained through Monte Carlo simulation methods. There are no positive and negative co-movements between the time series for wavelet coherency. The information on positive and negative co-movements can distinguish the lead-lag relationships between two time series through the phase of the wavelets.
Following Conaria and Soares (2011), we can define the complex partial wavelet coherency between x and y after controlling z as follows
$$ {C}_{xy/z}=\frac{C_{xy}-{C}_{xZ}{C^{\ast}}_{yz}}{{\left|\left(1-{R_{xz}}^2\right)\left(1-{R_{yz}}^2\right.\right|}^{1/2}} $$
(17)
Cxy and Rxy are complex wavelet coherency and wavelet coherency, respectively, as defined in Eqs. (12, 13). The partial wavelet coherency between x and y given z can be defined by taking the absolute value of the denominator in eq. (13). We may write the expression for partial wavelet coherency in terms of wavelet coherency as
$$ {R}_{xy/z}\left(\tau, S\right)=\frac{R_{xv}\left(\tau, S\right)-{R}_{xy}\left(\tau, S\right){R}_{yz}^{\ast}\left(\tau, S\right)}{\sqrt{\left(1-{\left({R}_{xz}\left(\tau, S\right)\right)}^2\right)\left(1-{\left({R}_{yz}\left(\tau, S\right)\right)}^2\right)}} $$
(18)