Skip to main content

S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA

Abstract

This study forecasts the return and volatility dynamics of S&P BSE Sensex and S&P BSE IT indices of the Bombay Stock Exchange. To achieve the objectives, the study uses descriptive statistics; tests including variance ratio, Augmented Dickey-Fuller, Phillips-Perron, and Kwiatkowski Phillips Schmidt and Shin; and Autoregressive Integrated Moving Average (ARIMA). The analysis forecasts daily stock returns for the S&P BSE Sensex and S&P BSE IT time series, using the ARIMA model. The results reveal that the mean returns of both indices are positive but near zero. This is indicative of a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to their actual values, with few deviations. Hence, the ARIMA model is capable of predicting medium- or long-term horizons using historical values of S&P BSE Sensex and S&P BSE IT.

Introduction

Theoretical and empirical studies have revealed that the relation between stock markets and economic growth is positive (Kim et al. 2011; Guptha and Rao 2018; Mallikarjuna and Rao 2019). Investment decision plays a significant role in attaining the desired returns through stock market forecasts. However, stock markets are characterized by their dynamic, complex, and volatile nature. Hence, forecasting stock prices and returns is a challenging task. Stock or investment returns are based on many factors—primarily, the prediction of stock movements. The prediction and estimation of stock returns in a particular stock exchange/s occurs hourly. Considering the importance of forecasting stock prices and their returns, researchers have paid significant attention to enhancing the model accuracy in the prediction of stock price movements and returns. In this regard, the fundamental explanation is that investors, policymakers, and financial institutions must be dynamic and excel in their decision making in order to optimize the returns on their investments. When stock markets are efficient, capital assets would be appropriated in the pre-eminent conceivable way (Fama 1970). The efficient market hypothesis (EMH) (Fama 1965) asserts that a market is efficient when the prices fully reflect public and private information. Market efficiency has 3 forms: weak, semi-strong, and strong. The weak form specifies that forecasted values cannot be influenced by historical prices. The semi-strong form is subjected to openly accessible data. The strong form states that the stock price movements have an impact on all open and inside information. All three forms are tested in this study.

If a prediction model can provide a good estimation of the movement of stock prices, then the uncertainty and risk involved in the investment process could be minimized. It would thus be useful for investors and policymakers to stipulate appropriate investment decisions and required measures to improve the flow of investments in stock markets. Several techniques have been used to forecast the stock market. The main purpose of forecasting is to assist in investment decisions, improve investors’ accuracy, and enhance efficient performance. However, the general uncertain conditions in the stock market may change or disrupt the stock market consistency. Uncertainty conditions could be overcome by applying appropriate stock market strategies through accurate forecasting tools (Zhang et al. 2019a, 2019b). Accurate and fast forecasting of the stock market is the main challenging aspect. Many researchers have focused on finding the best forecasting tools and methods to obtain fast and accurate predictions of stock prices (Javier and Rosario 2003). In time series analysis, autoregressive integrated moving average (ARIMA) is one of the best statistical forecasting methods for investors to get fast and accurate information on stock predictions. Moreover, the ARIMA models have shown evidence of whether the series is following integrated steps for stationarity or differencing steps for non stationarity (Merh et al. 2010).

The Bombay Stock Exchange (BSE) is considered one of the premier stock markets in the world. The S&P BSE Sensex is the bellwether index in the BSE. It measures the performance of 30 companies listed on BSE Ltd., which are popularly known as blue-chip companies. Among all sectors in the BSE, the leading sector is S&P BSE Information Technology (IT), with capitalization of 12.19%; in comparison, that of the S&P BSE Sensex is 100%.Footnote 1 The second most capitalized sector is the S&P BSE IT. It is intended to provide the investors with a benchmark reflecting companies included in the S&P BSE All Cap that are classified as members of the IT sector.

The primary objective of this study is to fit the ARIMA model in a way that best estimates the movements of the stock market. Further, it looks into how volatility acts on different time horizons of investment. Furthermore, it examines whether forecasted values are aligned with the actual values.

There are many techniques to forecast the movement of the stock market. The main motive of any stock market forecasting technique is to predict the movement of stock market prices more accurately. However, the existence of information asymmetry, insider trading, and other anomalies may change the direction of the market or lead to inconsistency in market performance. In addition to this, personal biases of investors such as overconfidence and illusion of control, the narrative fallacy, anchoring bias, loss aversion, herding mentality, etc., caused the wrong prediction of movements in the prices of stock markets. These are some causes of sudden loss in invested funds due to wrong estimations being made by investors on their investments or portfolios (Neely et al. 2014; Wang et al. 2018; Challa et al. 2018). Hence, the underlying problem is the estimation of more accurate and fast predictions of stock prices. There are few studies in the area of forecasting stock prices using GARCH and ARIMA models across developed stock markets and very few in developing stock markets. Further, most studies restricted themselves to estimating the movement of stock prices and ignored a comparison of the estimated values and the actual values to verify the accuracy of estimation (Zhang et al. 2019a, 2019b). Furthermore, no single study has made a comparison between S&P BSE Sensex and S&P BSE IT. Hence, it is necessary to carry out a detailed investigation to bridge this gap.

The S&P BSE Sensex is the oldest and most popular index of the BSE. It provides the most accurate measurement of the financial position of the stock market. Indeed, it is considered a barometer of the Indian stock market. The IT sector has seen tremendous growth after the liberalization of the Indian economy, and IT and IT-enabled services occupy a lion’s share in the service sector. Hence, small changes in these indices may have a great impact on the overall performance of the Indian stock market. The direction as well as the relationship of causation holds good for the IT segment of the BSE.

For this analysis, the authors used statistical and econometric models such as descriptive statistics, variance ratio (VR), Augmented Dickey-Fuller (ADF), Phillips and Perron (1988), Kwiatkowski Phillips Schmidt and Shin (KPSS), and ARIMA. First, the authors conducted an analysis of the performance of the S&P BSE Sensex and IT indices, a review of the literature, and an empirical study on market efficiency. An empirical analysis using the aforementioned models followed to calculate future returns. Moreover, ARIMA models were used to forecast the data series of S&P BSE Sensex and S&P BSE IT; these models can determine whether the actual stock prices are aligned with the estimated values.

The results can be summarized as follows. The descriptive statistics show that the mean and variance of the S&P BSE Sensex and S&P BSE IT returns show linearity. In addition, the VR test revealed that the S&P BSE Sensex and S&P BSE IT returns could be strongly predicted based on historical prices. The ARIMA model was used to determine the values of the parameters using autocorrelation (AC) and partial autocorrelation (PAC) coefficients; ADF test, PP test, and KPSS were used to test the stationarity of the data. The results showed that the time series data have stationarity. This study estimates the ARIMA model through identified values and auto-ARIMA. The results revealed that the mean returns of both indices are positive, but near zero. This may be an indication of a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values, with few deviations. To verify the accuracy of the estimations, the prediction was done for two years, and then the predicted values were compared with the actual values.

As for the EMH, the share prices reflect all the information, and it is impossible to generate a consistent alpha. Hence, it can be inferred that the stocks may not outperform the overall market due to either expert stock selection or market timing. The results indicated a regressive tendency in which the returns are estimated with high accuracy in the long run. This is evident in the case of S&P BSE Sensex and S&P BSE IT, where the estimated and actual values are almost equal. This reveals that both the indices were not following random walk theory. In other words, the movements of the indices are predictable. Hence, both the BSE indices under study exhibited a semi-strong form of EMH, as their stock prices are forecasted based on past data. There is no relevance for the strong form of EMH in this study as the researchers used only public information and ignored private information.

Literature review

Stock returns forecasting mechanisms are important to the development of investment policies. However, based on EMH, consistent risk-adjusted returns (Kou et al. 2014) above the line of market profitability as a whole are not possible. Computational advancements have led to various econometric models, which have been used consistently to anticipate stock market movements and thus forecast future stock prices and stock returns (Suits 1962; Zotteri et al. 2005; Wen et al. 2019). ARIMA models are efficient to forecast short-term financial time series data (Schmitz and Watts 1970; Rangan and Titida 2006; Kyungjoo et al. 2007; Merh et al. 2010; Sterba and Hilovska 2010). Various studies have used ARIMA forecasting models to predict stock returns (Khasei et al. 2009; Lee and Ho 2011; Khashei et al. 2012). Gerra (1959) examined the stock price movements for the egg industry by using least squares methods. The Jenkins ARIMA approach is more efficient and accurate than other economic models such as regression and exponential smoothing (Reid 1971; Naylor II et al. 1972; Newbold and Granger 1974). The ARIMA approach is more accurate with forecasting short-term stock returns than long-term returns (Sabur and Zahidul Hague 1992).

Neely et al. (2014) used technical indicators to forecast stock returns and found that technical indicators are economically and statistically significant. Several studies have relied on the predictability of stock returns (Rapach et al. 2010; Zhu and Zhu 2013; Pettenuzzo et al. 2014; Jiahan and Ilias 2017). Rapach et al. (2010) forecasted the equity premium (Welch and Goyal 2008; Turner 2015) by using compound returns on S&P 500 index including dividends and rate on treasury bills and established a link between the forecasted values and real economy. Phan et al. (2015) discussed evidence-based forecasting for stock returns. Rapach et al. (2016) showed the vector autoregression decomposition from a cash flow channel, which in turn showed the source of predictive power. Furthermore, there is evidence of a relationship between short-sellers and traders. Wang et al. (2018) showed the dynamic relationship between returns and volume based on US stock returns. They found that investors do not gain much profit by following the volume curve.

Zhang et al. (2018) examined oil price forecasting by using 18 macroeconomic and 18 technical indicators. The results showed accurate forecasts and generated certainty equivalent return gains for a mean-variance investor. Zhang et al. (2019a, 2019b) explained not only the trading behavior of intraday stock movement, but also the evidence of U-shaped investment curve. They found that afternoon stock prediction is significant using morning returns.

This study analyzed the efficiency of BSE. In the past decades, many researchers discussed the efficiency of stock market predictability (Fama 1970, 1991; Lo and MacKinlay 1988; Fama and French 1988). Stock markets are considered efficient if stock prices fully reflect, at any point in time, relevant or available information. EMH (Fama 1965) is one of the most widely accepted financial theories. Various approaches have been used to test the EMH for stock markets, for instance, serial correlation tests, unit root tests, and VR tests (Wu 1986, 1996; Laurence et al. 1997; Mookerjee and Yu 1999; Liu et al. 1997; Groenewold et al. 2003; Seddighi and Nian 2004). Lo and MacKinlay (1989) proved that VR tests are more powerful than unit root and serial correlation tests (Munteanu and Pece 2015), particularly in the existence of heteroscedasticity.

Individual VR tests in the literature have not provided consensus on the weak EMH, so multiple VR tests are preferable (Long et al. 1999; Darrat and Zhong 2000; Ma and Barnes 2001; Lee and Rui 2001; Lima and Tabak 2004; Fifield and Jetty 2008). Chow and Denning (1993) suggested that multiple VR tests are useful to avoid misleading statistical inferences based on asymptotic normal probabilities. Whang and Kim (2003) and Kim (2006) proposed powerful alternatives: sub sampling of non-dependency asymptotic probability and wild bootstrap probability.

Following this logic, this study adopted multiple VR tests, as suggested by Whang and Kim (2003) and Kim (2006), and the conventional Chow-Denning test to study the random walk hypothesis for the BSE (Diebold and Inoue 2001; Kapetanious and Shin 2011; Aye et al. 2017).

Problem statement

As mentioned earlier, several studies have been carried out on the prediction of stock market returns using ARIMA and other models, especially in developed markets. However, very few have focused on developing and less developed markets. Among the existing models, ARIMA has proved to be more efficient and accurate (Box & Jenkins 1970). Furthermore, the ARIMA model is more suitable for more accurate estimates of short-term returns than long-term returns, though many previous studies have used the ARIMA model to estimate long-term returns. However, there are very few studies on the prediction of returns on the Indian stock market in general, and S&P BSE Sensex in particular. It is evident from the literature that no study has predicted the returns of the S&P BSE Sensex and its subcomponent, that is, the S&P BSE IT, which is a sectoral index. This study feels this gap in the literature. Based on the observations of the literature and its objectives, this study hypothesizes that there is no significant relationship between actual and predicted values of S&P BSE Sensex and S&P BSE IT stocks.

Data and methodology

Data were collected from two indices, S&P BSE Sensex and S&P BSE IT. Empirical analysis was carried out on the daily returns of the S&P BSE Sensex and S&P BSE IT indices, for the period January 1, 2007 to December 31, 2017. It was observed that all indices have experienced high volatility in performance. However, the data also experienced the highest shock during the year 2008–2009 for all 13 indices. The reason was the worldwide financial crisis, which also affected the Indian stock market (Eigner and Umlauft 2015).

In this context, there is a need to determine whether the above-mentioned crisis caused steep to and fro changes in stock prices listed on the S&P BSE Sensex and S&P BSE IT. Furthermore, it is also necessary to apply the ARIMA model with validation and testing, which was not done in most previous studies. Therefore, an attempt is made to test and forecast the stock prices by incorporating ARIMA models. The data were collected from www.bseindia.com, and the daily returns calculated using the following formula.

$$ {R}_{it}=\ln \left(\frac{P_t}{P_{t-1}}\right) $$
(i)

Rit is the return of the index;

Pt is the closing price of the index at time t;

Pt − 1is the closing price of the index at time t-1; and.

ln is the natural logarithm of returns.

The ARIMA model is used to forecast future returns, and it is a combination of autoregressive and moving average models (Pankratz 2009). The mathematical formula of the model is as follows.

$$ \left(1-{\sum}_{k=1}^p{\alpha}_k{L}^k\right)\ {\left(1-L\right)}^d{X}_t=\left(1+{\sum}_{k=1}^q{\beta}_k{L}^k\right)\ \varepsilon t $$
(ii)

The Box-Jenkins method is one that assumes the time series has underlying stationarity, if not applied by the first-degree difference. This is called the ARIMA (p, d, q) model, where d represents the selection of the differencing degree. If the time series already possesses stationarity, then ARIMA (p, d, q) will be termed an ARMA (p,q) model.

Many researchers believe that GARCH and EGARCH models cannot provide the best results compared with ARIMA models, and that ARIMA is the best model for forecasting and modeling stock prices (Miswan et al. 2014; Pahlavani and Roshan 2015). Hence, the ARIMA model is appropriate to predict stock returns accurately with prospective market strategies to be followed by investors. Furthermore, some mixed models like ARIMA-GARCH, TGARCH, EGARCH, or GJR may be used to find the volatility of stock prices or returns by assuming symmetric or asymmetric effects. However, according to Thushara (2018), ARIMA and ARIMA-GARCH models produce the same results over time, and volatility does not change. Hence, the ARIMA model, along with the mean and variance equations, is used to predict future returns.

In a real-time situation, the appropriate model could be determined based on four steps. The first step is identification, in which the correlogram and partial correlogram tools are employed to determine the appropriate values of p, d, and q. Moreover, the ADF test is used to test the stationarity of the data. The second step is estimation, in which the parameters are estimated after identification of the chosen model, using the least squares method. The third step is a diagnostic check to examine whether the residuals from the fitted model have white noise. If it exists, accept the chosen model; otherwise, start afresh. Therefore, this model is an iterative process. In the fourth step, forecasting performance, the successful ARIMA model from step three is used within and outside the sample period to forecast future returns of stock prices.

Empirical analysis

Descriptive statistics

An overview of the basic statistical features of time series data is necessary before data analysis. Figure 1 shows the daily returns of the S&P BSE Sensex and S&P BSE IT. The authors used the statistical software Eviews 9.5 to analyze the data and applied each step of the ARIMA process. Figure 1 depicts the returns on the ‘y’ axis and years on the ‘x’ axis; years 2007 to 2017 are termed 1 to 18.

Fig. 1
figure 1

Line graph for S&P BSE Sensex and S&P BSE IT returns

The descriptive statistics of S&P BSE Sensex and S&P BSE IT are summarized in Table 1. The Table 1 reveals that the mean returns are positive but nearly zero, which indicates a regressive tendency in the long-term. The differences between the minimum and maximum values are 0.1198(S&P BSE Sensex returns) and 0.0979 (S&P BSE IT returns). The standard deviation is 0.6% for S&P BSE Sensex and 0.7% for S&P BSE IT returns. These values indicate high volatility in the BSE under the sample period. Sensex displays positive skewness (0.159), which means a symmetric tail. Meanwhile, S&P BSE IT displays negative skewness of − 0.145181, which represents an asymmetric tail. An asymmetric tail indicates a high probability of earnings from returns with high risk, as the value of skewness is greater than the mean value of returns. The kurtosis value of S&P BSE Sensex and S&P BSE IT are 13.23763 and 8.493113, respectively. Both are greater than (+ 3) standard normal distributions, which explains the sharp peak and fat tail distribution of BSE. This implies the time series data do not follow the normal distribution. The Jarque-Bera value for S&P BSE Sensex is 11,907.32 and for S&P BSE IT is 3434.352; both are much higher than a standard normal distribution (5.8825). Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the indices.

Table 1 Descriptive Statistics for S&P BSE Sensex and S&P BSE IT

Variance ratio test

A popular approach to predict asset prices is the Lo and MacKinlay (1988, 1989) VR test, which is useful to examine time series data’s predictability by comparing the variances of returns at various intervals. Moreover, if it is assumed that the data follow a random walk, then period variance must be in the times variance of a single period difference (Tabak 2003). Hence, the VR test is based on the assumption that the data follows random walk or not. The present analysis follows the rank, rank-score, and sign-based forms of Lo & MacKinlay and Kim to determine statistical significance. Lo and MacKinlay’s (1988, 1989) VR test could be performed in homoscedastic and heteroscedastic random walks, which use asymptotic normal or wild bootstrap (Kim 2006) probabilities. In addition to the rank, rank-score, and sign-based forms (Wright 2000), tests have been evaluated with bootstrap for statistical significance. Furthermore, Wald and multiple comparison VR tests (Richardson & Smith 1991; Chow and Denning 1993) have been performed for several intervals. In this analysis, the random walk series was assumed to test the data.

S indicates the series from 1 to 7;

S1 indicates the VR test for Lo and MacKinlay (1988) homoskedasticity, no bias correction, and random walk series; S2 is the VR test for Lo and MacKinlay (1988) Heteroskedasticity, martingale series; S3 defines VR test for the Wright (2000) rank and random walk series; S4 shows the VR test for the rank score and random walk series; S5 represents the VR test for the sign-based test and martingale series; S6 implies the VR test for Kim (2006), homoskedasticity and random walk series using 1000 replications; S7 infers the VR test for Kim (2006), Heteroskedasticity and random walk series using 1000 replications.

Curly brackets indicate VR values;

Square brackets indicate P- values;

Parenthesis indicate Wald (Chi-Square) values;

The short holding periods 2, 4, 8, 16 are considered for the VR tests (Deo & Richardson 2003).

Table 2 shows the calculations of standard (Lo and MacKinlay 1988, 1989), non-parametric (Wright 2000), and multiple VR tests (Chow and Denning 1993), and the modified version of a multiple VR test (Belaire-Franc & Contreras 2004). The multiple VR tests presented in column 3 prove that all the tests reject the null hypothesis of a random walk or martingale for the returns of both indices. Columns 4, 5, and 6 in Table 2 present the Z-Statistic, VR, and p-values for 2, 4, 8, and 16 holding periods for the individual tests. These results rejected the null hypothesis at the 1% significance level. Therefore, Table 2 shows that the returns of S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Hence, it may be concluded that these indices are not efficient. This finding is consistent with Rapach et al. (2013), who used the same methods and confirmed that the weak form was rejected.

Table 2 Multiple and individual VR tests for return series

Application of the ARIMA methodology

The ARIMA could be processed in two stages: the first is developing the ARIMA model, and the second is validating the predicted results with actual ones for the holdback period of two years (January 1st 2015 to December 31st 2017). From the observed literature it is evident that two years holdback period is appropriate in order to validate the accurate predictions. The authors also tested whether residuals are white noises through the diagnosis and parameter significance tests.

Developing the ARIMA model

Correlogram to determine the appropriate values of p, d, and q

AC and PAC are two types of correlation coefficients for correlograms. The autocorrelation function (ACF) represents the correlation of current first-differencing S&P BSE Sensex and S&P BSE IT returns with 12 lags. The partial autocorrelation function (PACF) indicates the correlation between the total observations of the study and their intermediate lags. ACF and PACF are applied using the Box Jenkins methodology to identify the type of ARMA model and determine the appropriate values of p and q. The ACF is calculated by the following formula:

$$ {\hat{\rho}}_k=\frac{\gamma_k}{\gamma_0} $$
(iii)

\( {\hat{\rho}}_k \) is the ACF of the given sample;

γk is the covariance at lag k; and.

γ0 is the sample variance.

Figure 2 shows the 12 series of S&P BSE Sensex and S&P BSE IT returns of the AC, PAC, Q-stat, and probability statistics. The standard error calculation is used to test the significance of each AC coefficient. The dotted lines represent the error bounds on each side of the AC and PAC, which could be measured using the following formula.

$$ \hat{\rho}\sim \pm 2/\left(\sqrt{T}\right) $$
(iv)
Fig. 2
figure 2

correlogram of S&P BSE Sensex and S&P BSE IT first degree returns

Figure 2 shows that few correlations are statistically significant using the standard error correlation coefficient formula; this can be calculated using \( \sqrt{1/n} \) = \( \sqrt{1/2724} \) = 0.01916, where n is the sample size. Therefore, the 95% confidence interval, according to the normal distribution for \( {\hat{\rho}}_k \), is 0 ± 1.98084 (0.01916) or (− 0.037953 to 0.037953). If correlation coefficients are outside these bounds, they are statistically significant at the5%level. Hence, both ACF and PACF correlations at lags 1, 2, 6, and 8 seem to be statistically significant for S&P BSE Sensex. Therefore, p and q values for the ARMA model are 1,2,6, and 8 for S&P BSE Sensex, which can be denoted as AR (1), AR (2), AR (6), and AR (8) for autoregression lags, and the moving average lags are MA (1), MA (2), MA (6), and MA (8). For S&P BSE IT, the correlations lags are 1, 2, and 5, and can be designated as AR (1), AR (2), AR (5), MA (1), MA (2),and MA (5).

Unit root tests

The unit root tests are used to examine stationarity in the series. In the present analysis, three tests are conducted to check the presence of unit roots: ADF, PP, and KPSS. The null hypothesis of the stock returns series, which holds a unit root for ADF, PP, and KPSS, was rejected as it was less than 5% of p-values. Therefore, all three tests confirmed that the stationary series did not comprise unit roots.

TE1 - Test equation with intercept;

TE2 - Test equation with trend & intercept;

TE3 - Test equation without intercept;

Table 3 shows strong evidence of stationarity for S&P BSE Sensex and S&P BSE IT returns with the absence of long-term shocks in their returns. The unit root tests for the above three methods show the same results in cases without intercept, with intercept, and with trend and intercept values for S&P BSE Sensex and S&P BSE IT.

Table 3 ADF test results for S&P BSE Sensex and S&P BSE IT logarithmic returns

ARIMA model estimation through identified p, d, q values

ARIMA is a combination of AR and MA terms. To estimate the best-fit values, the linear regression model was executed. The estimation of the S&P BSE Sensex best-fit ARMA model is based on the lags of 1, 2, 6, and 8; the AR and MA were executed, and the results are shown in Table 7 in Appendix. Table 7 in Appendix shows the estimation criteria of both the S&P BSE Sensex and S&P BSE IT sectors. In the S&P BSE IT sector, the AR and MA terms 1, 2, and 5 are significant, but S&P BSE Sensex MA (8) is not significant. Therefore, the term MA (8) was removed after adjustment in consideration of the S&P BSE Sensex AR and MA terms 1, 2, and 6. According to these terms, the estimation of ARMA is depicted in Table 8 in Appendix.Since the MA (8) coefficient was not significant, MA (8) was dropped, and the model is re-estimated with the AR (1), AR (2), AR (6), MA (1), MA (2), and MA (6) terms. The results are shown in Fig. 4, which reveals the randomly distributed residuals from the least squares regression method. Akaike Information Criterion (AIC) and Schwarz Criterion (SC) are the most preferable measurements to choose the best model. The AIC value for S&P BSE Sensex is − 7.019098 for the AR term and − 7.304545 for the MA term. The S&P BSE Sensex accumulated SC in the AR term is − 7.008247 and − 7.293694 for the MA term. In the case of S&P BSE IT, the AR and MA terms for AIC are − 6.720785 and − 7.038715, respectively. The SC values for AR and MA are − 6.709934 and − 7.027864, respectively. These AIC and SC values do not show much difference, although the best model can be chosen with the less value; hence, AIC was chosen. The MA model AIC and SC values are lower than those of the AR model. Therefore, the MA model terms were chosen for the S&P BSE Sensex, with the terms 1 and 6. The evidence is shown in Table 4.

Table 4 AIC and SC values for S&P BSE Sensex and S&P BSE IT (different significance combinations)

In general, the maximum likelihood estimation made through the outer product of the gradients/ Berndt–Hall–Hall–Hausman method for least squares follows the AR term. For ARIMA models, it is complex to mention likelihood as an explicit function, but it is beneficial for the innovations or prediction errors. The combination of (1, 6) for S&P BSE Sensex obtained the best-fit ARMA model, as shown in Fig. 3. Figure 3 also shows the best-fit ARMA model for the IT sector, which reveals the terms are 1 and 2.

Fig. 3
figure 3

Chosen ARMA models for S&P BSE Sensex and S&P BSE IT sectors

The residuals from both the best-fit models were tested for ADF, which revealed that the data of residuals from this method are stationary.

ARIMA model estimation through auto ARIMA

The Auto ARIMA model estimation was carried out using AIC comparisons, which determine the best fit of the time series data for future forecasting. In this model, 25 observations of ARMA terms were estimated. The estimated ARMA terms and respective AIC values are presented in Table 5.

Table 5 Auto ARIMA estimated terms and AIC values

Forecasting ARIMA

Once the ARMA is fitted, it could be used for forecasting future returns. This is possible through two types of forecasting methods: static and dynamic. The actual present and lagged values were used in static forecasting, whereas the previous forecasted values were used in dynamic forecasting. Using the model in Fig. 3, the static and dynamic forecasting values are shown in Table 6. Root mean square error (RMSE) and mean absolute error (MAE) were the measures used to isolate the forecasting model more appropriately.

Table 6 Forecasting evaluation results

Table 6 provides the RMSE and MAE values of S&P BSE Sensex and S&P BSE IT returns. MAE and RMSE were calculated according to the errors between the forecasted and the actual data. The selected ARMA models provide more accurate results for the holdback period.

Validation for actual and forecast values

The validation phase is important to determine the accuracy of the predicted values. This could be achieved by using a static forecasting instrument in the ARIMA process. In other words, after the completion of the estimation phase, the authors attempted to forecast the future returns by comparing these forecasted returns with the actual ones. In this study, the holdback period was from January 1, 2015 to December 31, 2017. The actual and forecasted values are depicted in Fig. 4.

Fig. 4
figure 4

a Actual and forecasted values for S&P BSE Sensex returns. b Actual and forecasted values for S&P BSE IT returns

In Fig. 4 (a), SENSEX_RETF refers to the forecasted values, which are specified with a blue line. DSEN is referred to as the first-degree values of S&P BSE Sensex returns, which are marked by a red dashed line. Both values are traversing simultaneously, which means that the forecasted values and the actual values are almost the same. However, very few variations were identified in May 2015, August 2015, and February 2016. These variations may indicate error-prone areas of prediction, RMSE (0.005), and MSE (0.004), which are shown in Table 6. Figure 4(b) provides the comparative graph of the S&P BSE IT sector, which represents IT_RETURNSF (forecasted IT returns) with a blue line and DIT (first degree of IT returns) with a red dashed line. The forecasted and actual values are almost the same, but few variations were observed in July 2015, August 2015, July 2016, June 2017, and August 2017, which indicated the error predictions, evidencing to RMSE (0.006), and MSE (0.005) in Table 6.

Findings of the study

The descriptive statistics of S&P BSE Sensex and S&P BSE IT revealed that the mean returns were positive but nearly zero. It indicates regressive tendency in the long-term values. An asymmetric tail indicates a high probability of earnings in returns with high risk, as the value of skewness is greater than the mean value of returns. The S&P BSE Sensex Jarque-Bera value is much higher than the standard normal distribution. Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the S&P BSE Sensex and S&P BSE IT. The statistics of the standard VR test, non-parametric VR test, multiple VR test, and modified version of multiple VR test rejected the null hypothesis of a random walk or martingale for both the index returns. Therefore, the returns of the S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Thus, it may be concluded that the results did not provide any evidence in favor of the EMH for either S&P BSE Sensex or S&P BSE IT in the long run. The findings suggest that past information priced the stocks instantly, as these indices indicate a semi-strong form of EMH.

Conclusion

ARIMA methodology is one of the most widely used forecasting methods for the stock market, which is also referred to as the Box-Jenkins (BJ) method. It can be useful for analyzing historical data of time series and moving average of random error terms. In this analysis, ARIMA (1, 6) for Sensex and ARIMA (1, 2) for IT yielded a highly accurate forecast over the two-year holdback period. In this analysis, uncertainty was found when the period is long, whereas less uncertainty exists when the period is short. The study reveals the efficiency of the process in predicting the complex and volatile series of stock data. By applying ARIMA, fast and accurate prediction was confirmed using time series data.

The results showed that the mean returns of both the indices are positive but near zero. This indicates a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values with fewer deviations. These findings have significant implications. Investors can choose their investments according to the forecasted returns analyzed in the present study. Furthermore, investors can invest in profitable stocks to ensure a good portfolio. This study could help researchers, companies, investors, and policymakers to make appropriate decisions in the stock market. Further, researchers can investigate the time series prediction by applying various models, such as genetic models, nanotechnology models, and non-linear regression models. Companies may frame the appropriate strategies to fetch lucrative returns on their investments. Optimum portfolio for the individual investors may be built; policymakers can take relevant decisions for smooth functioning of stock market.

Nonetheless, this study suffers from some limitations. It was confined to S&P BSE Sensex and S&P BSE IT, which comprises only a few companies of the Indian corporate sector. There are many sectorial indices under the BSE, using which could have provided a more holistic study and provided clues to investors to derive better returns on investments. Furthermore, the study could have focused on intra comparison of the accuracy of the estimation of returns on various time horizons.

Future research can consider the prediction and comparison of stock prices in developed and emerging stock markets. Moreover, long-term forecasting by applying novel technologies will provide assurance of good returns. Comparative analysis of various sectorial indices between India and other countries will be the thrust area to explore more insights in their portfolio construction, risk and return, performance, and efficiency of trading.

Availability of data and materials

Source of Data sets is available in http://www.bseindia.com and http://finance.yahoo.com. Analyzed data uploaded as supplementary material files.

Notes

  1. www.bseindia.com

Abbreviations

ARIMA:

AUTO REGRESSIVE INTEGRATED MOVING AVERAGE

AIC:

Akaike Information Criteria

MAE:

Mean Absolute Error

RMSE:

Root Mean Square Error

SC:

Schwarz Criterion

DW:

Durbin –Watson

ADF:

Augmented Dickie Fuller

S.E of Reg:

Standard Error Regression

BSE :

Bombay Stock Exchange

IT:

Information Technology

ACF:

Auto Correlation Function

PACF:

Partial Auto Correlation Function

ARMA:

Auto Regressive Moving Average

AR:

Auto Regressive

MA:

Moving Average

VR test:

Variance ratio test

PP test:

Phillips-Perron test

KPSS test:

Kwiatkowski Phillips Schmidt and Shin test

S&P:

Standard and Poor

OPG:

Outer product of the gradients

BHHH:

Berndt–Hall–Hall–Hausman

References

  • Aye GC, Gil-Alana LA, Gupta R, Wohar ME (2017) The efficiency of the art market: evidence from variance ratio tests, linear and nonlinear fractional integration approaches. Int Rev Econ Finance 51(C):283–294

    Article  Google Scholar 

  • Barnes ML, Ma S (2001) Market Efciency or Not? The Behaviour of China’s Stock Prices in Response to the Announcement of Bonus Issues 2001. https://ro.uow.edu.au/commpapers/475

  • Box GEP, Jenkins GM (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco

    Google Scholar 

  • Belaire-Franch J, Contreras D (2004) Ranks and Signs-Based Multiple Variance Ratio Tests. Spanish-Italian Meeting on Financial Mathematics, Cuenca, November 2003, Vol. 7, pp 40–79

  • Challa ML, Malepati V, Kolusu SNR (2018) Forecasting risk using autoregressive integrated moving average approach: evidence from S&P BSE Sensex. Financial Innovation 4:24. https://doi.org/10.1186/s40854-018-0107-z

    Article  Google Scholar 

  • Chow KV, Denning KC (1993) A simple multiple variance ratio test. J Econ 58:385–401

    Article  Google Scholar 

  • Darrat AF, Zhong M (2000) On testing the random-walk hypothesis: a model comparison approach. Financ Rev 35:105–124

    Article  Google Scholar 

  • Diebold FX, Inoue A (2001) Long memory and regime switching. J Econ 105:131–159

    Article  Google Scholar 

  • Deo R, Richardson M (2003) On the asymptotic power of the variance ratio test, Econometric Trhory 19(02): 231–239. https://doi.org/10.1017/S0266466603192018.

  • Eigner P, Umlauft TS (2015) The great depression(s) of 1929–1933 and 2007–2009? Parallels, differences and policy lessonsHungarian Academy of Science MTA-ELTE Crisis History Working Paper No. 2, Available at SSRN: https://ssrn.com/abstract=2612243 or. https://doi.org/10.2139/ssrn.2612243

    Book  Google Scholar 

  • Fama E (1991) Efficient capital markets: II. J Financ 46:1575–1617

    Article  Google Scholar 

  • Fama EF (1965) Random walks in stock market prices. Financ Anal J 21(5):55–59. https://doi.org/10.2469/faj.v51.n1.1861

    Article  Google Scholar 

  • Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Financ 25(2):383–417

    Article  Google Scholar 

  • Fama EF, French KR (1988) Dividend yields and expected stock returns. J Financ Econ 91:389–406

    Google Scholar 

  • Fifield GM, Jetty J (2008) Further evidence on the efficiency of the Chinese stock markets: a note. Res Int Bus Financ 22:351–361

    Article  Google Scholar 

  • Gerra MJ (1959) An econometric model of the egg industry: a correction. Am J Agric Econ 41(4):803–804

    Google Scholar 

  • Groenewold N, Tang SHK, Wu Y (2003) The efficiency of the Chinese stock market and the role of banks. J Asian Econ 14:593–609

    Article  Google Scholar 

  • Guptha SK, Rao RP (2018) The causal relationship between financial development and economic growth experience with BRICS economies. J Soc Econ Dev 20(2):308–326

    Article  Google Scholar 

  • Javier C, Rosario E, Francisco JN, Antonio JC (2003) ARIMA models to predict next electricity Price. IEEE Trans Power Syst 18(3):1014–1020

    Article  Google Scholar 

  • Jiahan L, Ilias T (2017) Equity premium prediction: the role of economic and statistical constraints. J Financial Markets 36(C):56–75

    Google Scholar 

  • Kapetanious G, Shin Y (2011) Testing the null hypothesis of non-stationary long memory against the alternative hypothesis of a nonlinear Ergodic model. Econometrics Rev 30(6):620–645

    Article  Google Scholar 

  • Khasei M, Bijari M, Ardali GAR (2009) Improvement of auto- regressive integrated moving average models using fuzzy logic and artificial neural network. Neurocomputing 72(4–6):956–967

    Article  Google Scholar 

  • Khashei M, Bijari M, Ardali GAR (2012) Hybridization of autoregressive integrated moving average (ARIMA) with probabilistic neural networks. Comput Ind Eng 63(1):37–45

    Article  Google Scholar 

  • Kim JH (2006) Wild bootstrapping variance ratio tests. Econ Lett 92:38–43

    Article  Google Scholar 

  • Kim JH, Lim KP, Shamsuddin A (2011) Stock return predictability and the adaptive markets hypothesis: evidence from century long U.S data. J Empir Financ 18:868–879

    Article  Google Scholar 

  • Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inform Sci 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137

    Article  Google Scholar 

  • Kyungjoo LC, Sehwan Y, John J (2007) Neural network model vs. SARIMA model in forecasting Korean stock Price index (KOSPI). Issues Information Syst 8(2):372–378

    Google Scholar 

  • Laurence M, Cai F, Qian S (1997) Weak-form efficiency and causality tests in Chinese stock markets. Multinational Finance J 1:291–307

    Article  Google Scholar 

  • Lee C, Ho C (2011) Short-term load forecasting using lifting scheme and ARIMA model. Expert System Appl 38(5):5902–5911

    Article  Google Scholar 

  • Lee CF, Rui OM (2001) Does trading volume contain information to predict stock returns? Evidence from China’s stock markets. Rev Quant Finan Acc 14:341–360

    Article  Google Scholar 

  • Lima EJA, Tabak BM (2004) Tests of the random walk hypothesis for equity markets: evidence from China, Hong Kong and Singapore. Appl Econ Lett 11:255–258

    Article  Google Scholar 

  • Liu X, Song H, Romilly P (1997) Are Chinese stock markets efficient? A cointegration and causality analysis. Appl Econ Lett 4:511–515

    Article  Google Scholar 

  • Lo AW, MacKinlay AC (1988) Stock market prices do not follow random walk: evidence from a simple specification test. Rev Financ Stud 1:41–66

    Article  Google Scholar 

  • Lo AW, MacKinlay AC (1989) The size and power of the variance ratio test in finite samples: a Monte Carlo investigation. J Econ 40:203–238

    Article  Google Scholar 

  • Long DM, Payne JD, Feng C (1999) Information transmission in the Shanghai equity market. J Financ Res 22:29–45

    Article  Google Scholar 

  • Mallikarjuna M, Rao RP (2019) Evaluation of forecasting methods from selected stock market returns. Financial Innovation 5:40(2019). https://doi.org/10.1186/s40854-019-0157-x

    Article  Google Scholar 

  • Merh N, Saxena VP, Pardasani KR (2010) A comparison between hybrid approaches of ANN and ARIMA for Indian stock trend forecasting. J Business Intelligence 3(2):23–43

    Google Scholar 

  • Miswan NH, Ngatiman NA, Hamzah K, Zamzamin ZZ (2014) Comparative performance of ARIMA and GARCH models in Modelling and forecasting volatility of Malaysia market properties and shares. Appl Math Sci 8(140):7001–7012. https://doi.org/10.12988/ams.2014.47548

    Article  Google Scholar 

  • Mookerjee R, Yu Q (1999) An empirical analysis of the equity markets in China. Rev Financ Econ 8:41–60

    Article  Google Scholar 

  • Munteanu A, Pece A (2015) Investigating art market efficiency. Procedia Soc Behav Sci 188:82–88

    Article  Google Scholar 

  • Naylor T II, Seaks TG, Wichern DW (1972) Box-Jenkins methods: an alternative to econometric models. Int Stat Rev 40:123–137

    Article  Google Scholar 

  • Neely CJ, Rapach DE, Tu J, Zhou G (2014) Forecasting the equity risk premium: the role of technical indicators. Manag Sci 60:1772–1791 http://dx.doi.org/http://arxiv.org/abs/http://dx.doi.org/10.1287/mnsc.2013.183

    Article  Google Scholar 

  • Newbold P, Granger CWJ (1974) Experience with forecasting univariate time series and the combination of forecasts. J R Statist Soc A 137:131–165

    Article  Google Scholar 

  • Pahlavani M, Roshan R (2015) The comparison among ARIMA and hybrid ARIMA-GARCH models in forecasting the exchange rate of Iran. Int J Business Dev Studies 7(1):31–50

    Google Scholar 

  • Pankratz A (2009) Forecasting with univariate Box-Jenkins models: Concepts and cases, Wiley Series in Probability and Statistics, ISBN: 978-0-470-31727-3.

  • Pettenuzzo D, Timmermann A, Valkanov R (2014) Forecasting stock returns under economic constraints. J Financ Econ 114(3):517–553

    Article  Google Scholar 

  • Phan DHB, Sharma SS, Narayan PK (2015) Stock return forecasting: some new evidence. Int Rev Financ Anal 40:38–51

    Article  Google Scholar 

  • Phillips P, Perron P (1988) Testing for a unit root in time series regression. Biometrica 75:335–346.

    Article  Google Scholar 

  • Richardson M, Smith T (1991) Tests of Financial Models in the Presence of Overlapping Observations. The Review Financial Studies 4:227–254

  • Rangan N, Titida N (2006) ARIMA Model for Forecasting Oil Palm Price. In: Proceedings of the 2nd IMT-GT Regional Conference on Mathematics, Statistics and Applications, University Sains Malaysia, 2006

  • Rapach DE, Matthew RC, Zhou G (2016) Short interest and aggregate stock returns. J Financ Econ 121:46–65. https://doi.org/10.1016/j.jfineco.2016.03.004

    Article  Google Scholar 

  • Rapach DE, Strauss JK, Zhou G (2010) Out-of-sample equity premium prediction: combination forecasts and links to the real economy. Rev Financ Stud 23:821–862

    Article  Google Scholar 

  • Rapach David E, Strauss JK, Zhou G, (2013) International stock return predictability: What is the role of the United States? J Finance 68(4):1633–1662

  • Reid GA (1971) On the calkin representations, proceedings of London. Mathematical Society s3–23(3):547–564. https://doi.org/10.1112/plms/s3-23.3.547

    Article  Google Scholar 

  • Sabur SA, Zahidul Hague M (1992) Resource-use efficiency and returns from some selected winter crops in Bangladesh. Econ Aff 37(3):158–168

    Google Scholar 

  • Schmitz A, Watts DG (1970) Forecasting wheat yields: an application of parametric time series modeling. Am J Agric Econ 52(2):109

    Article  Google Scholar 

  • Seddighi HR, Nian W (2004) The Chinese stock exchange market: operations and efficiency. Appl Financ Econ 14:785–797

    Article  Google Scholar 

  • Sterba J, Hilovska (2010) The implementation of hybrid ARIMA neural network prediction model for aggregate water consumption prediction. Aplimat- J Applied Mathematics 3(3):123–131

    Google Scholar 

  • Suits DB (1962) Forecasting and analysis with an econometric model. Am Econ Rev 52(1):104–132

    Google Scholar 

  • Tabak BM (2003) The random walk hypothesis and the behavior of foreign capital portfolio flows: the Brazilian stock market case. Appl Finance Econ 13:369–378

    Article  Google Scholar 

  • Thushara SC (2018) What the purpose of ARIMA –Garch? Retrieved from: https://www.researchgate.net/post/What_the_purpose_of_ARIMA-Garch

    Google Scholar 

  • Turner JA (2015) Casting doubt on the predictability of stock returns in real time: Bayesian model averaging using realistic priors. Rev Finance 19:785–821

    Article  Google Scholar 

  • Wang Z, Qian Y, Wang S (2018) Dynamic trading volume and stock return relation: does it hold out of sample? Int Rev Financ Anal 58:195–210

    Article  Google Scholar 

  • Welch I, Goyal A (2008) A comprehensive look at the empirical performance of equity premium prediction. Rev Financ Stud 21:1455–1508

    Article  Google Scholar 

  • Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and stock price crash risk: evidence from China. Int Rev Financ Anal 101376. https://doi.org/10.1016/j.irfa.2019.101376

  • Whang Y-J, Kim J (2003) A multiple variance ratio test using subsampling. Econ Lett 79:225–230

    Article  Google Scholar 

  • Wright JH (2000) Alternative variance-ratio tests using ranks and signs. J Bus Econ Stat 18:1–9

    Google Scholar 

  • Wu CFJ (1986) Jakknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1295

    Article  Google Scholar 

  • Wu SN (1996) The analysis of the efficient of securities market in our country. Econ Res 6:1–39 (in Chinese)

    Google Scholar 

  • Zhang Y, Ma F, Shi B, Huang D (2018) Forecasting the prices of crude oil: an iterated combination approach. Energy Econ 70:472–483

    Article  Google Scholar 

  • Zhang Y, Ma F, Zhu B (2019a) Intraday momentum and stock return predictability: evidence from China. Econ Model 76:319–329

    Article  Google Scholar 

  • Zhang Y, Zeng Q, Ma F, Shi B (2019b) Forecasting stock returns: do less powerful predictors help? Econ Model. https://doi.org/10.1016/j.econmod.2018.09.014https://www.sciencedirect.com/science/article/pii/S0264999318301901 forthcoming

  • Zhu X, Zhu J (2013) Predicting stock returns: a regime-switching combination approach and economic links. J Bank Financ 37:4120–4133

    Article  Google Scholar 

  • Zotteri G, Kalchschmidt M, Caniato F (2005) The impact of aggregation level on forecasting performance. Int J Prod Econ 93–94:479–491. https://doi.org/10.1016/j.ijpe.2004.06.044

    Article  Google Scholar 

Download references

Acknowledgements

Not Applicable.

Funding

Not Applicable

Author information

Authors and Affiliations

Authors

Contributions

Study of conception and design: CML, MVR, KSNR. Acquisition of data: CML. Analysis and interpretation of data: CML. Supervision: MVR, KSNR. Drafting of manuscript: CML. Critical revision: MVR, KSNR. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Venkataramanaiah Malepati.

Ethics declarations

Competing interests

Authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Appendices

Appendix 1

Table 7 AR and MA terms Estimation for S&P BSE Sensex and IT sectors

Appendix 2

Table 8 Adjusted ARMA terms in S&P BSE Sensex

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Challa, M.L., Malepati, V. & Kolusu, S.N.R. S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA. Financ Innov 6, 47 (2020). https://doi.org/10.1186/s40854-020-00201-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40854-020-00201-5

Keywords

JEL classifications