 Research
 Open Access
 Published:
S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA
Financial Innovation volume 6, Article number: 47 (2020)
Abstract
This study forecasts the return and volatility dynamics of S&P BSE Sensex and S&P BSE IT indices of the Bombay Stock Exchange. To achieve the objectives, the study uses descriptive statistics; tests including variance ratio, Augmented DickeyFuller, PhillipsPerron, and Kwiatkowski Phillips Schmidt and Shin; and Autoregressive Integrated Moving Average (ARIMA). The analysis forecasts daily stock returns for the S&P BSE Sensex and S&P BSE IT time series, using the ARIMA model. The results reveal that the mean returns of both indices are positive but near zero. This is indicative of a regressive tendency in the longterm. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to their actual values, with few deviations. Hence, the ARIMA model is capable of predicting medium or longterm horizons using historical values of S&P BSE Sensex and S&P BSE IT.
Introduction
Theoretical and empirical studies have revealed that the relation between stock markets and economic growth is positive (Kim et al. 2011; Guptha and Rao 2018; Mallikarjuna and Rao 2019). Investment decision plays a significant role in attaining the desired returns through stock market forecasts. However, stock markets are characterized by their dynamic, complex, and volatile nature. Hence, forecasting stock prices and returns is a challenging task. Stock or investment returns are based on many factors—primarily, the prediction of stock movements. The prediction and estimation of stock returns in a particular stock exchange/s occurs hourly. Considering the importance of forecasting stock prices and their returns, researchers have paid significant attention to enhancing the model accuracy in the prediction of stock price movements and returns. In this regard, the fundamental explanation is that investors, policymakers, and financial institutions must be dynamic and excel in their decision making in order to optimize the returns on their investments. When stock markets are efficient, capital assets would be appropriated in the preeminent conceivable way (Fama 1970). The efficient market hypothesis (EMH) (Fama 1965) asserts that a market is efficient when the prices fully reflect public and private information. Market efficiency has 3 forms: weak, semistrong, and strong. The weak form specifies that forecasted values cannot be influenced by historical prices. The semistrong form is subjected to openly accessible data. The strong form states that the stock price movements have an impact on all open and inside information. All three forms are tested in this study.
If a prediction model can provide a good estimation of the movement of stock prices, then the uncertainty and risk involved in the investment process could be minimized. It would thus be useful for investors and policymakers to stipulate appropriate investment decisions and required measures to improve the flow of investments in stock markets. Several techniques have been used to forecast the stock market. The main purpose of forecasting is to assist in investment decisions, improve investors’ accuracy, and enhance efficient performance. However, the general uncertain conditions in the stock market may change or disrupt the stock market consistency. Uncertainty conditions could be overcome by applying appropriate stock market strategies through accurate forecasting tools (Zhang et al. 2019a, 2019b). Accurate and fast forecasting of the stock market is the main challenging aspect. Many researchers have focused on finding the best forecasting tools and methods to obtain fast and accurate predictions of stock prices (Javier and Rosario 2003). In time series analysis, autoregressive integrated moving average (ARIMA) is one of the best statistical forecasting methods for investors to get fast and accurate information on stock predictions. Moreover, the ARIMA models have shown evidence of whether the series is following integrated steps for stationarity or differencing steps for non stationarity (Merh et al. 2010).
The Bombay Stock Exchange (BSE) is considered one of the premier stock markets in the world. The S&P BSE Sensex is the bellwether index in the BSE. It measures the performance of 30 companies listed on BSE Ltd., which are popularly known as bluechip companies. Among all sectors in the BSE, the leading sector is S&P BSE Information Technology (IT), with capitalization of 12.19%; in comparison, that of the S&P BSE Sensex is 100%.^{Footnote 1} The second most capitalized sector is the S&P BSE IT. It is intended to provide the investors with a benchmark reflecting companies included in the S&P BSE All Cap that are classified as members of the IT sector.
The primary objective of this study is to fit the ARIMA model in a way that best estimates the movements of the stock market. Further, it looks into how volatility acts on different time horizons of investment. Furthermore, it examines whether forecasted values are aligned with the actual values.
There are many techniques to forecast the movement of the stock market. The main motive of any stock market forecasting technique is to predict the movement of stock market prices more accurately. However, the existence of information asymmetry, insider trading, and other anomalies may change the direction of the market or lead to inconsistency in market performance. In addition to this, personal biases of investors such as overconfidence and illusion of control, the narrative fallacy, anchoring bias, loss aversion, herding mentality, etc., caused the wrong prediction of movements in the prices of stock markets. These are some causes of sudden loss in invested funds due to wrong estimations being made by investors on their investments or portfolios (Neely et al. 2014; Wang et al. 2018; Challa et al. 2018). Hence, the underlying problem is the estimation of more accurate and fast predictions of stock prices. There are few studies in the area of forecasting stock prices using GARCH and ARIMA models across developed stock markets and very few in developing stock markets. Further, most studies restricted themselves to estimating the movement of stock prices and ignored a comparison of the estimated values and the actual values to verify the accuracy of estimation (Zhang et al. 2019a, 2019b). Furthermore, no single study has made a comparison between S&P BSE Sensex and S&P BSE IT. Hence, it is necessary to carry out a detailed investigation to bridge this gap.
The S&P BSE Sensex is the oldest and most popular index of the BSE. It provides the most accurate measurement of the financial position of the stock market. Indeed, it is considered a barometer of the Indian stock market. The IT sector has seen tremendous growth after the liberalization of the Indian economy, and IT and ITenabled services occupy a lion’s share in the service sector. Hence, small changes in these indices may have a great impact on the overall performance of the Indian stock market. The direction as well as the relationship of causation holds good for the IT segment of the BSE.
For this analysis, the authors used statistical and econometric models such as descriptive statistics, variance ratio (VR), Augmented DickeyFuller (ADF), Phillips and Perron (1988), Kwiatkowski Phillips Schmidt and Shin (KPSS), and ARIMA. First, the authors conducted an analysis of the performance of the S&P BSE Sensex and IT indices, a review of the literature, and an empirical study on market efficiency. An empirical analysis using the aforementioned models followed to calculate future returns. Moreover, ARIMA models were used to forecast the data series of S&P BSE Sensex and S&P BSE IT; these models can determine whether the actual stock prices are aligned with the estimated values.
The results can be summarized as follows. The descriptive statistics show that the mean and variance of the S&P BSE Sensex and S&P BSE IT returns show linearity. In addition, the VR test revealed that the S&P BSE Sensex and S&P BSE IT returns could be strongly predicted based on historical prices. The ARIMA model was used to determine the values of the parameters using autocorrelation (AC) and partial autocorrelation (PAC) coefficients; ADF test, PP test, and KPSS were used to test the stationarity of the data. The results showed that the time series data have stationarity. This study estimates the ARIMA model through identified values and autoARIMA. The results revealed that the mean returns of both indices are positive, but near zero. This may be an indication of a regressive tendency in the longterm. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values, with few deviations. To verify the accuracy of the estimations, the prediction was done for two years, and then the predicted values were compared with the actual values.
As for the EMH, the share prices reflect all the information, and it is impossible to generate a consistent alpha. Hence, it can be inferred that the stocks may not outperform the overall market due to either expert stock selection or market timing. The results indicated a regressive tendency in which the returns are estimated with high accuracy in the long run. This is evident in the case of S&P BSE Sensex and S&P BSE IT, where the estimated and actual values are almost equal. This reveals that both the indices were not following random walk theory. In other words, the movements of the indices are predictable. Hence, both the BSE indices under study exhibited a semistrong form of EMH, as their stock prices are forecasted based on past data. There is no relevance for the strong form of EMH in this study as the researchers used only public information and ignored private information.
Literature review
Stock returns forecasting mechanisms are important to the development of investment policies. However, based on EMH, consistent riskadjusted returns (Kou et al. 2014) above the line of market profitability as a whole are not possible. Computational advancements have led to various econometric models, which have been used consistently to anticipate stock market movements and thus forecast future stock prices and stock returns (Suits 1962; Zotteri et al. 2005; Wen et al. 2019). ARIMA models are efficient to forecast shortterm financial time series data (Schmitz and Watts 1970; Rangan and Titida 2006; Kyungjoo et al. 2007; Merh et al. 2010; Sterba and Hilovska 2010). Various studies have used ARIMA forecasting models to predict stock returns (Khasei et al. 2009; Lee and Ho 2011; Khashei et al. 2012). Gerra (1959) examined the stock price movements for the egg industry by using least squares methods. The Jenkins ARIMA approach is more efficient and accurate than other economic models such as regression and exponential smoothing (Reid 1971; Naylor II et al. 1972; Newbold and Granger 1974). The ARIMA approach is more accurate with forecasting shortterm stock returns than longterm returns (Sabur and Zahidul Hague 1992).
Neely et al. (2014) used technical indicators to forecast stock returns and found that technical indicators are economically and statistically significant. Several studies have relied on the predictability of stock returns (Rapach et al. 2010; Zhu and Zhu 2013; Pettenuzzo et al. 2014; Jiahan and Ilias 2017). Rapach et al. (2010) forecasted the equity premium (Welch and Goyal 2008; Turner 2015) by using compound returns on S&P 500 index including dividends and rate on treasury bills and established a link between the forecasted values and real economy. Phan et al. (2015) discussed evidencebased forecasting for stock returns. Rapach et al. (2016) showed the vector autoregression decomposition from a cash flow channel, which in turn showed the source of predictive power. Furthermore, there is evidence of a relationship between shortsellers and traders. Wang et al. (2018) showed the dynamic relationship between returns and volume based on US stock returns. They found that investors do not gain much profit by following the volume curve.
Zhang et al. (2018) examined oil price forecasting by using 18 macroeconomic and 18 technical indicators. The results showed accurate forecasts and generated certainty equivalent return gains for a meanvariance investor. Zhang et al. (2019a, 2019b) explained not only the trading behavior of intraday stock movement, but also the evidence of Ushaped investment curve. They found that afternoon stock prediction is significant using morning returns.
This study analyzed the efficiency of BSE. In the past decades, many researchers discussed the efficiency of stock market predictability (Fama 1970, 1991; Lo and MacKinlay 1988; Fama and French 1988). Stock markets are considered efficient if stock prices fully reflect, at any point in time, relevant or available information. EMH (Fama 1965) is one of the most widely accepted financial theories. Various approaches have been used to test the EMH for stock markets, for instance, serial correlation tests, unit root tests, and VR tests (Wu 1986, 1996; Laurence et al. 1997; Mookerjee and Yu 1999; Liu et al. 1997; Groenewold et al. 2003; Seddighi and Nian 2004). Lo and MacKinlay (1989) proved that VR tests are more powerful than unit root and serial correlation tests (Munteanu and Pece 2015), particularly in the existence of heteroscedasticity.
Individual VR tests in the literature have not provided consensus on the weak EMH, so multiple VR tests are preferable (Long et al. 1999; Darrat and Zhong 2000; Ma and Barnes 2001; Lee and Rui 2001; Lima and Tabak 2004; Fifield and Jetty 2008). Chow and Denning (1993) suggested that multiple VR tests are useful to avoid misleading statistical inferences based on asymptotic normal probabilities. Whang and Kim (2003) and Kim (2006) proposed powerful alternatives: sub sampling of nondependency asymptotic probability and wild bootstrap probability.
Following this logic, this study adopted multiple VR tests, as suggested by Whang and Kim (2003) and Kim (2006), and the conventional ChowDenning test to study the random walk hypothesis for the BSE (Diebold and Inoue 2001; Kapetanious and Shin 2011; Aye et al. 2017).
Problem statement
As mentioned earlier, several studies have been carried out on the prediction of stock market returns using ARIMA and other models, especially in developed markets. However, very few have focused on developing and less developed markets. Among the existing models, ARIMA has proved to be more efficient and accurate (Box & Jenkins 1970). Furthermore, the ARIMA model is more suitable for more accurate estimates of shortterm returns than longterm returns, though many previous studies have used the ARIMA model to estimate longterm returns. However, there are very few studies on the prediction of returns on the Indian stock market in general, and S&P BSE Sensex in particular. It is evident from the literature that no study has predicted the returns of the S&P BSE Sensex and its subcomponent, that is, the S&P BSE IT, which is a sectoral index. This study feels this gap in the literature. Based on the observations of the literature and its objectives, this study hypothesizes that there is no significant relationship between actual and predicted values of S&P BSE Sensex and S&P BSE IT stocks.
Data and methodology
Data were collected from two indices, S&P BSE Sensex and S&P BSE IT. Empirical analysis was carried out on the daily returns of the S&P BSE Sensex and S&P BSE IT indices, for the period January 1, 2007 to December 31, 2017. It was observed that all indices have experienced high volatility in performance. However, the data also experienced the highest shock during the year 2008–2009 for all 13 indices. The reason was the worldwide financial crisis, which also affected the Indian stock market (Eigner and Umlauft 2015).
In this context, there is a need to determine whether the abovementioned crisis caused steep to and fro changes in stock prices listed on the S&P BSE Sensex and S&P BSE IT. Furthermore, it is also necessary to apply the ARIMA model with validation and testing, which was not done in most previous studies. Therefore, an attempt is made to test and forecast the stock prices by incorporating ARIMA models. The data were collected from www.bseindia.com, and the daily returns calculated using the following formula.
R_{it} is the return of the index;
P_{t} is the closing price of the index at time t;
P_{t − 1}is the closing price of the index at time t1; and.
ln is the natural logarithm of returns.
The ARIMA model is used to forecast future returns, and it is a combination of autoregressive and moving average models (Pankratz 2009). The mathematical formula of the model is as follows.
The BoxJenkins method is one that assumes the time series has underlying stationarity, if not applied by the firstdegree difference. This is called the ARIMA (p, d, q) model, where d represents the selection of the differencing degree. If the time series already possesses stationarity, then ARIMA (p, d, q) will be termed an ARMA (p,q) model.
Many researchers believe that GARCH and EGARCH models cannot provide the best results compared with ARIMA models, and that ARIMA is the best model for forecasting and modeling stock prices (Miswan et al. 2014; Pahlavani and Roshan 2015). Hence, the ARIMA model is appropriate to predict stock returns accurately with prospective market strategies to be followed by investors. Furthermore, some mixed models like ARIMAGARCH, TGARCH, EGARCH, or GJR may be used to find the volatility of stock prices or returns by assuming symmetric or asymmetric effects. However, according to Thushara (2018), ARIMA and ARIMAGARCH models produce the same results over time, and volatility does not change. Hence, the ARIMA model, along with the mean and variance equations, is used to predict future returns.
In a realtime situation, the appropriate model could be determined based on four steps. The first step is identification, in which the correlogram and partial correlogram tools are employed to determine the appropriate values of p, d, and q. Moreover, the ADF test is used to test the stationarity of the data. The second step is estimation, in which the parameters are estimated after identification of the chosen model, using the least squares method. The third step is a diagnostic check to examine whether the residuals from the fitted model have white noise. If it exists, accept the chosen model; otherwise, start afresh. Therefore, this model is an iterative process. In the fourth step, forecasting performance, the successful ARIMA model from step three is used within and outside the sample period to forecast future returns of stock prices.
Empirical analysis
Descriptive statistics
An overview of the basic statistical features of time series data is necessary before data analysis. Figure 1 shows the daily returns of the S&P BSE Sensex and S&P BSE IT. The authors used the statistical software Eviews 9.5 to analyze the data and applied each step of the ARIMA process. Figure 1 depicts the returns on the ‘y’ axis and years on the ‘x’ axis; years 2007 to 2017 are termed 1 to 18.
The descriptive statistics of S&P BSE Sensex and S&P BSE IT are summarized in Table 1. The Table 1 reveals that the mean returns are positive but nearly zero, which indicates a regressive tendency in the longterm. The differences between the minimum and maximum values are 0.1198(S&P BSE Sensex returns) and 0.0979 (S&P BSE IT returns). The standard deviation is 0.6% for S&P BSE Sensex and 0.7% for S&P BSE IT returns. These values indicate high volatility in the BSE under the sample period. Sensex displays positive skewness (0.159), which means a symmetric tail. Meanwhile, S&P BSE IT displays negative skewness of − 0.145181, which represents an asymmetric tail. An asymmetric tail indicates a high probability of earnings from returns with high risk, as the value of skewness is greater than the mean value of returns. The kurtosis value of S&P BSE Sensex and S&P BSE IT are 13.23763 and 8.493113, respectively. Both are greater than (+ 3) standard normal distributions, which explains the sharp peak and fat tail distribution of BSE. This implies the time series data do not follow the normal distribution. The JarqueBera value for S&P BSE Sensex is 11,907.32 and for S&P BSE IT is 3434.352; both are much higher than a standard normal distribution (5.8825). Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the indices.
Variance ratio test
A popular approach to predict asset prices is the Lo and MacKinlay (1988, 1989) VR test, which is useful to examine time series data’s predictability by comparing the variances of returns at various intervals. Moreover, if it is assumed that the data follow a random walk, then period variance must be in the times variance of a single period difference (Tabak 2003). Hence, the VR test is based on the assumption that the data follows random walk or not. The present analysis follows the rank, rankscore, and signbased forms of Lo & MacKinlay and Kim to determine statistical significance. Lo and MacKinlay’s (1988, 1989) VR test could be performed in homoscedastic and heteroscedastic random walks, which use asymptotic normal or wild bootstrap (Kim 2006) probabilities. In addition to the rank, rankscore, and signbased forms (Wright 2000), tests have been evaluated with bootstrap for statistical significance. Furthermore, Wald and multiple comparison VR tests (Richardson & Smith 1991; Chow and Denning 1993) have been performed for several intervals. In this analysis, the random walk series was assumed to test the data.
S indicates the series from 1 to 7;
S1 indicates the VR test for Lo and MacKinlay (1988) homoskedasticity, no bias correction, and random walk series; S2 is the VR test for Lo and MacKinlay (1988) Heteroskedasticity, martingale series; S3 defines VR test for the Wright (2000) rank and random walk series; S4 shows the VR test for the rank score and random walk series; S5 represents the VR test for the signbased test and martingale series; S6 implies the VR test for Kim (2006), homoskedasticity and random walk series using 1000 replications; S7 infers the VR test for Kim (2006), Heteroskedasticity and random walk series using 1000 replications.
Curly brackets indicate VR values;
Square brackets indicate P values;
Parenthesis indicate Wald (ChiSquare) values;
The short holding periods 2, 4, 8, 16 are considered for the VR tests (Deo & Richardson 2003).
Table 2 shows the calculations of standard (Lo and MacKinlay 1988, 1989), nonparametric (Wright 2000), and multiple VR tests (Chow and Denning 1993), and the modified version of a multiple VR test (BelaireFranc & Contreras 2004). The multiple VR tests presented in column 3 prove that all the tests reject the null hypothesis of a random walk or martingale for the returns of both indices. Columns 4, 5, and 6 in Table 2 present the ZStatistic, VR, and pvalues for 2, 4, 8, and 16 holding periods for the individual tests. These results rejected the null hypothesis at the 1% significance level. Therefore, Table 2 shows that the returns of S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Hence, it may be concluded that these indices are not efficient. This finding is consistent with Rapach et al. (2013), who used the same methods and confirmed that the weak form was rejected.
Application of the ARIMA methodology
The ARIMA could be processed in two stages: the first is developing the ARIMA model, and the second is validating the predicted results with actual ones for the holdback period of two years (January 1st 2015 to December 31st 2017). From the observed literature it is evident that two years holdback period is appropriate in order to validate the accurate predictions. The authors also tested whether residuals are white noises through the diagnosis and parameter significance tests.
Developing the ARIMA model
Correlogram to determine the appropriate values of p, d, and q
AC and PAC are two types of correlation coefficients for correlograms. The autocorrelation function (ACF) represents the correlation of current firstdifferencing S&P BSE Sensex and S&P BSE IT returns with 12 lags. The partial autocorrelation function (PACF) indicates the correlation between the total observations of the study and their intermediate lags. ACF and PACF are applied using the Box Jenkins methodology to identify the type of ARMA model and determine the appropriate values of p and q. The ACF is calculated by the following formula:
\( {\hat{\rho}}_k \) is the ACF of the given sample;
γ_{k} is the covariance at lag k; and.
γ_{0} is the sample variance.
Figure 2 shows the 12 series of S&P BSE Sensex and S&P BSE IT returns of the AC, PAC, Qstat, and probability statistics. The standard error calculation is used to test the significance of each AC coefficient. The dotted lines represent the error bounds on each side of the AC and PAC, which could be measured using the following formula.
Figure 2 shows that few correlations are statistically significant using the standard error correlation coefficient formula; this can be calculated using \( \sqrt{1/n} \) = \( \sqrt{1/2724} \) = 0.01916, where n is the sample size. Therefore, the 95% confidence interval, according to the normal distribution for \( {\hat{\rho}}_k \), is 0 ± 1.98084 (0.01916) or (− 0.037953 to 0.037953). If correlation coefficients are outside these bounds, they are statistically significant at the5%level. Hence, both ACF and PACF correlations at lags 1, 2, 6, and 8 seem to be statistically significant for S&P BSE Sensex. Therefore, p and q values for the ARMA model are 1,2,6, and 8 for S&P BSE Sensex, which can be denoted as AR (1), AR (2), AR (6), and AR (8) for autoregression lags, and the moving average lags are MA (1), MA (2), MA (6), and MA (8). For S&P BSE IT, the correlations lags are 1, 2, and 5, and can be designated as AR (1), AR (2), AR (5), MA (1), MA (2),and MA (5).
Unit root tests
The unit root tests are used to examine stationarity in the series. In the present analysis, three tests are conducted to check the presence of unit roots: ADF, PP, and KPSS. The null hypothesis of the stock returns series, which holds a unit root for ADF, PP, and KPSS, was rejected as it was less than 5% of pvalues. Therefore, all three tests confirmed that the stationary series did not comprise unit roots.
TE1  Test equation with intercept;
TE2  Test equation with trend & intercept;
TE3  Test equation without intercept;
Table 3 shows strong evidence of stationarity for S&P BSE Sensex and S&P BSE IT returns with the absence of longterm shocks in their returns. The unit root tests for the above three methods show the same results in cases without intercept, with intercept, and with trend and intercept values for S&P BSE Sensex and S&P BSE IT.
ARIMA model estimation through identified p, d, q values
ARIMA is a combination of AR and MA terms. To estimate the bestfit values, the linear regression model was executed. The estimation of the S&P BSE Sensex bestfit ARMA model is based on the lags of 1, 2, 6, and 8; the AR and MA were executed, and the results are shown in Table 7 in Appendix. Table 7 in Appendix shows the estimation criteria of both the S&P BSE Sensex and S&P BSE IT sectors. In the S&P BSE IT sector, the AR and MA terms 1, 2, and 5 are significant, but S&P BSE Sensex MA (8) is not significant. Therefore, the term MA (8) was removed after adjustment in consideration of the S&P BSE Sensex AR and MA terms 1, 2, and 6. According to these terms, the estimation of ARMA is depicted in Table 8 in Appendix.Since the MA (8) coefficient was not significant, MA (8) was dropped, and the model is reestimated with the AR (1), AR (2), AR (6), MA (1), MA (2), and MA (6) terms. The results are shown in Fig. 4, which reveals the randomly distributed residuals from the least squares regression method. Akaike Information Criterion (AIC) and Schwarz Criterion (SC) are the most preferable measurements to choose the best model. The AIC value for S&P BSE Sensex is − 7.019098 for the AR term and − 7.304545 for the MA term. The S&P BSE Sensex accumulated SC in the AR term is − 7.008247 and − 7.293694 for the MA term. In the case of S&P BSE IT, the AR and MA terms for AIC are − 6.720785 and − 7.038715, respectively. The SC values for AR and MA are − 6.709934 and − 7.027864, respectively. These AIC and SC values do not show much difference, although the best model can be chosen with the less value; hence, AIC was chosen. The MA model AIC and SC values are lower than those of the AR model. Therefore, the MA model terms were chosen for the S&P BSE Sensex, with the terms 1 and 6. The evidence is shown in Table 4.
In general, the maximum likelihood estimation made through the outer product of the gradients/ Berndt–Hall–Hall–Hausman method for least squares follows the AR term. For ARIMA models, it is complex to mention likelihood as an explicit function, but it is beneficial for the innovations or prediction errors. The combination of (1, 6) for S&P BSE Sensex obtained the bestfit ARMA model, as shown in Fig. 3. Figure 3 also shows the bestfit ARMA model for the IT sector, which reveals the terms are 1 and 2.
The residuals from both the bestfit models were tested for ADF, which revealed that the data of residuals from this method are stationary.
ARIMA model estimation through auto ARIMA
The Auto ARIMA model estimation was carried out using AIC comparisons, which determine the best fit of the time series data for future forecasting. In this model, 25 observations of ARMA terms were estimated. The estimated ARMA terms and respective AIC values are presented in Table 5.
Forecasting ARIMA
Once the ARMA is fitted, it could be used for forecasting future returns. This is possible through two types of forecasting methods: static and dynamic. The actual present and lagged values were used in static forecasting, whereas the previous forecasted values were used in dynamic forecasting. Using the model in Fig. 3, the static and dynamic forecasting values are shown in Table 6. Root mean square error (RMSE) and mean absolute error (MAE) were the measures used to isolate the forecasting model more appropriately.
Table 6 provides the RMSE and MAE values of S&P BSE Sensex and S&P BSE IT returns. MAE and RMSE were calculated according to the errors between the forecasted and the actual data. The selected ARMA models provide more accurate results for the holdback period.
Validation for actual and forecast values
The validation phase is important to determine the accuracy of the predicted values. This could be achieved by using a static forecasting instrument in the ARIMA process. In other words, after the completion of the estimation phase, the authors attempted to forecast the future returns by comparing these forecasted returns with the actual ones. In this study, the holdback period was from January 1, 2015 to December 31, 2017. The actual and forecasted values are depicted in Fig. 4.
In Fig. 4 (a), SENSEX_RETF refers to the forecasted values, which are specified with a blue line. DSEN is referred to as the firstdegree values of S&P BSE Sensex returns, which are marked by a red dashed line. Both values are traversing simultaneously, which means that the forecasted values and the actual values are almost the same. However, very few variations were identified in May 2015, August 2015, and February 2016. These variations may indicate errorprone areas of prediction, RMSE (0.005), and MSE (0.004), which are shown in Table 6. Figure 4(b) provides the comparative graph of the S&P BSE IT sector, which represents IT_RETURNSF (forecasted IT returns) with a blue line and DIT (first degree of IT returns) with a red dashed line. The forecasted and actual values are almost the same, but few variations were observed in July 2015, August 2015, July 2016, June 2017, and August 2017, which indicated the error predictions, evidencing to RMSE (0.006), and MSE (0.005) in Table 6.
Findings of the study
The descriptive statistics of S&P BSE Sensex and S&P BSE IT revealed that the mean returns were positive but nearly zero. It indicates regressive tendency in the longterm values. An asymmetric tail indicates a high probability of earnings in returns with high risk, as the value of skewness is greater than the mean value of returns. The S&P BSE Sensex JarqueBera value is much higher than the standard normal distribution. Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the S&P BSE Sensex and S&P BSE IT. The statistics of the standard VR test, nonparametric VR test, multiple VR test, and modified version of multiple VR test rejected the null hypothesis of a random walk or martingale for both the index returns. Therefore, the returns of the S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Thus, it may be concluded that the results did not provide any evidence in favor of the EMH for either S&P BSE Sensex or S&P BSE IT in the long run. The findings suggest that past information priced the stocks instantly, as these indices indicate a semistrong form of EMH.
Conclusion
ARIMA methodology is one of the most widely used forecasting methods for the stock market, which is also referred to as the BoxJenkins (BJ) method. It can be useful for analyzing historical data of time series and moving average of random error terms. In this analysis, ARIMA (1, 6) for Sensex and ARIMA (1, 2) for IT yielded a highly accurate forecast over the twoyear holdback period. In this analysis, uncertainty was found when the period is long, whereas less uncertainty exists when the period is short. The study reveals the efficiency of the process in predicting the complex and volatile series of stock data. By applying ARIMA, fast and accurate prediction was confirmed using time series data.
The results showed that the mean returns of both the indices are positive but near zero. This indicates a regressive tendency in the longterm. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values with fewer deviations. These findings have significant implications. Investors can choose their investments according to the forecasted returns analyzed in the present study. Furthermore, investors can invest in profitable stocks to ensure a good portfolio. This study could help researchers, companies, investors, and policymakers to make appropriate decisions in the stock market. Further, researchers can investigate the time series prediction by applying various models, such as genetic models, nanotechnology models, and nonlinear regression models. Companies may frame the appropriate strategies to fetch lucrative returns on their investments. Optimum portfolio for the individual investors may be built; policymakers can take relevant decisions for smooth functioning of stock market.
Nonetheless, this study suffers from some limitations. It was confined to S&P BSE Sensex and S&P BSE IT, which comprises only a few companies of the Indian corporate sector. There are many sectorial indices under the BSE, using which could have provided a more holistic study and provided clues to investors to derive better returns on investments. Furthermore, the study could have focused on intra comparison of the accuracy of the estimation of returns on various time horizons.
Future research can consider the prediction and comparison of stock prices in developed and emerging stock markets. Moreover, longterm forecasting by applying novel technologies will provide assurance of good returns. Comparative analysis of various sectorial indices between India and other countries will be the thrust area to explore more insights in their portfolio construction, risk and return, performance, and efficiency of trading.
Availability of data and materials
Source of Data sets is available in http://www.bseindia.com and http://finance.yahoo.com. Analyzed data uploaded as supplementary material files.
Notes
Abbreviations
 ARIMA:

AUTO REGRESSIVE INTEGRATED MOVING AVERAGE
 AIC:

Akaike Information Criteria
 MAE:

Mean Absolute Error
 RMSE:

Root Mean Square Error
 SC:

Schwarz Criterion
 DW:

Durbin –Watson
 ADF:

Augmented Dickie Fuller
 S.E of Reg:

Standard Error Regression
 BSE :

Bombay Stock Exchange
 IT:

Information Technology
 ACF:

Auto Correlation Function
 PACF:

Partial Auto Correlation Function
 ARMA:

Auto Regressive Moving Average
 AR:

Auto Regressive
 MA:

Moving Average
 VR test:

Variance ratio test
 PP test:

PhillipsPerron test
 KPSS test:

Kwiatkowski Phillips Schmidt and Shin test
 S&P:

Standard and Poor
 OPG:

Outer product of the gradients
 BHHH:

Berndt–Hall–Hall–Hausman
References
Aye GC, GilAlana LA, Gupta R, Wohar ME (2017) The efficiency of the art market: evidence from variance ratio tests, linear and nonlinear fractional integration approaches. Int Rev Econ Finance 51(C):283–294
Barnes ML, Ma S (2001) Market Efciency or Not? The Behaviour of China’s Stock Prices in Response to the Announcement of Bonus Issues 2001. https://ro.uow.edu.au/commpapers/475
Box GEP, Jenkins GM (1970) Time series analysis: forecasting and control. HoldenDay, San Francisco
BelaireFranch J, Contreras D (2004) Ranks and SignsBased Multiple Variance Ratio Tests. SpanishItalian Meeting on Financial Mathematics, Cuenca, November 2003, Vol. 7, pp 40–79
Challa ML, Malepati V, Kolusu SNR (2018) Forecasting risk using autoregressive integrated moving average approach: evidence from S&P BSE Sensex. Financial Innovation 4:24. https://doi.org/10.1186/s408540180107z
Chow KV, Denning KC (1993) A simple multiple variance ratio test. J Econ 58:385–401
Darrat AF, Zhong M (2000) On testing the randomwalk hypothesis: a model comparison approach. Financ Rev 35:105–124
Diebold FX, Inoue A (2001) Long memory and regime switching. J Econ 105:131–159
Deo R, Richardson M (2003) On the asymptotic power of the variance ratio test, Econometric Trhory 19(02): 231–239. https://doi.org/10.1017/S0266466603192018.
Eigner P, Umlauft TS (2015) The great depression(s) of 1929–1933 and 2007–2009? Parallels, differences and policy lessonsHungarian Academy of Science MTAELTE Crisis History Working Paper No. 2, Available at SSRN: https://ssrn.com/abstract=2612243 or. https://doi.org/10.2139/ssrn.2612243
Fama E (1991) Efficient capital markets: II. J Financ 46:1575–1617
Fama EF (1965) Random walks in stock market prices. Financ Anal J 21(5):55–59. https://doi.org/10.2469/faj.v51.n1.1861
Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Financ 25(2):383–417
Fama EF, French KR (1988) Dividend yields and expected stock returns. J Financ Econ 91:389–406
Fifield GM, Jetty J (2008) Further evidence on the efficiency of the Chinese stock markets: a note. Res Int Bus Financ 22:351–361
Gerra MJ (1959) An econometric model of the egg industry: a correction. Am J Agric Econ 41(4):803–804
Groenewold N, Tang SHK, Wu Y (2003) The efficiency of the Chinese stock market and the role of banks. J Asian Econ 14:593–609
Guptha SK, Rao RP (2018) The causal relationship between financial development and economic growth experience with BRICS economies. J Soc Econ Dev 20(2):308–326
Javier C, Rosario E, Francisco JN, Antonio JC (2003) ARIMA models to predict next electricity Price. IEEE Trans Power Syst 18(3):1014–1020
Jiahan L, Ilias T (2017) Equity premium prediction: the role of economic and statistical constraints. J Financial Markets 36(C):56–75
Kapetanious G, Shin Y (2011) Testing the null hypothesis of nonstationary long memory against the alternative hypothesis of a nonlinear Ergodic model. Econometrics Rev 30(6):620–645
Khasei M, Bijari M, Ardali GAR (2009) Improvement of auto regressive integrated moving average models using fuzzy logic and artificial neural network. Neurocomputing 72(4–6):956–967
Khashei M, Bijari M, Ardali GAR (2012) Hybridization of autoregressive integrated moving average (ARIMA) with probabilistic neural networks. Comput Ind Eng 63(1):37–45
Kim JH (2006) Wild bootstrapping variance ratio tests. Econ Lett 92:38–43
Kim JH, Lim KP, Shamsuddin A (2011) Stock return predictability and the adaptive markets hypothesis: evidence from century long U.S data. J Empir Financ 18:868–879
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inform Sci 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137
Kyungjoo LC, Sehwan Y, John J (2007) Neural network model vs. SARIMA model in forecasting Korean stock Price index (KOSPI). Issues Information Syst 8(2):372–378
Laurence M, Cai F, Qian S (1997) Weakform efficiency and causality tests in Chinese stock markets. Multinational Finance J 1:291–307
Lee C, Ho C (2011) Shortterm load forecasting using lifting scheme and ARIMA model. Expert System Appl 38(5):5902–5911
Lee CF, Rui OM (2001) Does trading volume contain information to predict stock returns? Evidence from China’s stock markets. Rev Quant Finan Acc 14:341–360
Lima EJA, Tabak BM (2004) Tests of the random walk hypothesis for equity markets: evidence from China, Hong Kong and Singapore. Appl Econ Lett 11:255–258
Liu X, Song H, Romilly P (1997) Are Chinese stock markets efficient? A cointegration and causality analysis. Appl Econ Lett 4:511–515
Lo AW, MacKinlay AC (1988) Stock market prices do not follow random walk: evidence from a simple specification test. Rev Financ Stud 1:41–66
Lo AW, MacKinlay AC (1989) The size and power of the variance ratio test in finite samples: a Monte Carlo investigation. J Econ 40:203–238
Long DM, Payne JD, Feng C (1999) Information transmission in the Shanghai equity market. J Financ Res 22:29–45
Mallikarjuna M, Rao RP (2019) Evaluation of forecasting methods from selected stock market returns. Financial Innovation 5:40(2019). https://doi.org/10.1186/s408540190157x
Merh N, Saxena VP, Pardasani KR (2010) A comparison between hybrid approaches of ANN and ARIMA for Indian stock trend forecasting. J Business Intelligence 3(2):23–43
Miswan NH, Ngatiman NA, Hamzah K, Zamzamin ZZ (2014) Comparative performance of ARIMA and GARCH models in Modelling and forecasting volatility of Malaysia market properties and shares. Appl Math Sci 8(140):7001–7012. https://doi.org/10.12988/ams.2014.47548
Mookerjee R, Yu Q (1999) An empirical analysis of the equity markets in China. Rev Financ Econ 8:41–60
Munteanu A, Pece A (2015) Investigating art market efficiency. Procedia Soc Behav Sci 188:82–88
Naylor T II, Seaks TG, Wichern DW (1972) BoxJenkins methods: an alternative to econometric models. Int Stat Rev 40:123–137
Neely CJ, Rapach DE, Tu J, Zhou G (2014) Forecasting the equity risk premium: the role of technical indicators. Manag Sci 60:1772–1791 http://dx.doi.org/http://arxiv.org/abs/http://dx.doi.org/10.1287/mnsc.2013.183
Newbold P, Granger CWJ (1974) Experience with forecasting univariate time series and the combination of forecasts. J R Statist Soc A 137:131–165
Pahlavani M, Roshan R (2015) The comparison among ARIMA and hybrid ARIMAGARCH models in forecasting the exchange rate of Iran. Int J Business Dev Studies 7(1):31–50
Pankratz A (2009) Forecasting with univariate BoxJenkins models: Concepts and cases, Wiley Series in Probability and Statistics, ISBN: 9780470317273.
Pettenuzzo D, Timmermann A, Valkanov R (2014) Forecasting stock returns under economic constraints. J Financ Econ 114(3):517–553
Phan DHB, Sharma SS, Narayan PK (2015) Stock return forecasting: some new evidence. Int Rev Financ Anal 40:38–51
Phillips P, Perron P (1988) Testing for a unit root in time series regression. Biometrica 75:335–346.
Richardson M, Smith T (1991) Tests of Financial Models in the Presence of Overlapping Observations. The Review Financial Studies 4:227–254
Rangan N, Titida N (2006) ARIMA Model for Forecasting Oil Palm Price. In: Proceedings of the 2nd IMTGT Regional Conference on Mathematics, Statistics and Applications, University Sains Malaysia, 2006
Rapach DE, Matthew RC, Zhou G (2016) Short interest and aggregate stock returns. J Financ Econ 121:46–65. https://doi.org/10.1016/j.jfineco.2016.03.004
Rapach DE, Strauss JK, Zhou G (2010) Outofsample equity premium prediction: combination forecasts and links to the real economy. Rev Financ Stud 23:821–862
Rapach David E, Strauss JK, Zhou G, (2013) International stock return predictability: What is the role of the United States? J Finance 68(4):1633–1662
Reid GA (1971) On the calkin representations, proceedings of London. Mathematical Society s3–23(3):547–564. https://doi.org/10.1112/plms/s323.3.547
Sabur SA, Zahidul Hague M (1992) Resourceuse efficiency and returns from some selected winter crops in Bangladesh. Econ Aff 37(3):158–168
Schmitz A, Watts DG (1970) Forecasting wheat yields: an application of parametric time series modeling. Am J Agric Econ 52(2):109
Seddighi HR, Nian W (2004) The Chinese stock exchange market: operations and efficiency. Appl Financ Econ 14:785–797
Sterba J, Hilovska (2010) The implementation of hybrid ARIMA neural network prediction model for aggregate water consumption prediction. Aplimat J Applied Mathematics 3(3):123–131
Suits DB (1962) Forecasting and analysis with an econometric model. Am Econ Rev 52(1):104–132
Tabak BM (2003) The random walk hypothesis and the behavior of foreign capital portfolio flows: the Brazilian stock market case. Appl Finance Econ 13:369–378
Thushara SC (2018) What the purpose of ARIMA –Garch? Retrieved from: https://www.researchgate.net/post/What_the_purpose_of_ARIMAGarch
Turner JA (2015) Casting doubt on the predictability of stock returns in real time: Bayesian model averaging using realistic priors. Rev Finance 19:785–821
Wang Z, Qian Y, Wang S (2018) Dynamic trading volume and stock return relation: does it hold out of sample? Int Rev Financ Anal 58:195–210
Welch I, Goyal A (2008) A comprehensive look at the empirical performance of equity premium prediction. Rev Financ Stud 21:1455–1508
Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and stock price crash risk: evidence from China. Int Rev Financ Anal 101376. https://doi.org/10.1016/j.irfa.2019.101376
Whang YJ, Kim J (2003) A multiple variance ratio test using subsampling. Econ Lett 79:225–230
Wright JH (2000) Alternative varianceratio tests using ranks and signs. J Bus Econ Stat 18:1–9
Wu CFJ (1986) Jakknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1295
Wu SN (1996) The analysis of the efficient of securities market in our country. Econ Res 6:1–39 (in Chinese)
Zhang Y, Ma F, Shi B, Huang D (2018) Forecasting the prices of crude oil: an iterated combination approach. Energy Econ 70:472–483
Zhang Y, Ma F, Zhu B (2019a) Intraday momentum and stock return predictability: evidence from China. Econ Model 76:319–329
Zhang Y, Zeng Q, Ma F, Shi B (2019b) Forecasting stock returns: do less powerful predictors help? Econ Model. https://doi.org/10.1016/j.econmod.2018.09.014https://www.sciencedirect.com/science/article/pii/S0264999318301901 forthcoming
Zhu X, Zhu J (2013) Predicting stock returns: a regimeswitching combination approach and economic links. J Bank Financ 37:4120–4133
Zotteri G, Kalchschmidt M, Caniato F (2005) The impact of aggregation level on forecasting performance. Int J Prod Econ 93–94:479–491. https://doi.org/10.1016/j.ijpe.2004.06.044
Acknowledgements
Not Applicable.
Funding
Not Applicable
Author information
Authors and Affiliations
Contributions
Study of conception and design: CML, MVR, KSNR. Acquisition of data: CML. Analysis and interpretation of data: CML. Supervision: MVR, KSNR. Drafting of manuscript: CML. Critical revision: MVR, KSNR. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Authors declare that they have no competing interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Appendices
Appendix 1
Appendix 2
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Challa, M.L., Malepati, V. & Kolusu, S.N.R. S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA. Financ Innov 6, 47 (2020). https://doi.org/10.1186/s40854020002015
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40854020002015
Keywords
 Efficient market hypothesis
 Bombay stock exchange
 ARIMA
 KPSS
 S&P BSE Sensex
 Forecasting
 S&P BSE IT
JEL classifications
 G12
 G14
 G17