Skip to main content

A hybrid Bayesian-network proposition for forecasting the crude oil price


This paper proposes a hybrid Bayesian Network (BN) method for short-term forecasting of crude oil prices. The method performed is a hybrid, based on both the aspects of classification of influencing factors as well as the regression of the out-of-sample values. For the sake of performance comparison, several other hybrid methods have also been devised using the methods of Markov Chain Monte Carlo (MCMC), Random Forest (RF), Support Vector Machine (SVM), neural networks (NNET) and generalized autoregressive conditional heteroskedasticity (GARCH). The hybrid methodology is primarily reliant upon constructing the crude oil price forecast from the summation of its Intrinsic Mode Functions (IMF) and its residue, extracted by an Empirical Mode Decomposition (EMD) of the original crude price signal. The Volatility Index (VIX) as well as the Implied Oil Volatility Index (OVX) has been considered among the influencing parameters of the crude price forecast. The final set of influencing parameters were selected as the whole set of significant contributors detected by the methods of Bayesian Network, Quantile Regression with Lasso penalty (QRL), Bayesian Lasso (BLasso) and the Bayesian Ridge Regression (BRR). The performance of the proposed hybrid-BN method is reported for the three crude price benchmarks: West Texas Intermediate, Brent Crude and the OPEC Reference Basket.


The price of crude oil has a pivotal role in the global economy and remains at the core of energy markets. As such, its fluctuations have the potential to impact economic developments worldwide. The ability to forecast the price of crude oil is therefore a useful tool in the management of most industrial sectors (Shin et al. 2013). Nevertheless, crude oil price forecasting has been a challenging task, owing to its complex behavior resulting from the confluent influence of several factors on the crude oil market. In specific, the nonlinear features exhibited in the dynamics of oil price volatilities present a quandary for predictive techniques, making the issue of (long-term) crude price forecasting open to finance research.

A wealth of literature exists on the topic of forecasting crude oil prices. These articles are myriad, both in terms of the types of models and the number of methods being used concurrently. Some studies use an approach with a single method (non-hybrid) and some are defined by several methods (hybrid). In this regard, the generalized autoregressive conditional heteroskedasticity (GARCH) was amongst the first methods used because of its ability to capture time-varying variance or volatility (Agnolucci 2009; Arouri et al. 2012; Cheong 2009; Fan et al. 2008a; Hou and Suardi 2012; Kang et al. 2009; Mohammadi and Su 2010; Narayan and Narayan 2007; Sadorsky 2006; Wei et al. 2010). We attempted to perform the GARCH model as a hybrid method by combining with other models, such as the stochastic volatility (SV) model, the implied volatility (IV) model and the support vector machine (SVM) model.

The neural network (NNET) method has been another approach for crude price forecasting (Azadeh et al. 2012; Ghaffari and Zare 2009; Movagharnejad et al. 2011; Shin et al. 2013; Wang et al. 2012; Yu et al. 2008; Zhang et al. 2008). However, it reportedly bears the disadvantage of over-fitting, local minima and weak generalization capability (Zhang et al. 2015). For this sake, its hybrid usage has been recommended for the purposes of crude price forecasting.

Some authors opted to use the SVM model for price prediction, taking advantage of its suitability for modeling small-sized data samples with nonlinear behavior (Guo et al. 2012). Others have reported on the merits of the wavelet technique for crude price forecasting (Yousefi et al. 2005) with one major shortcoming being its sensitivity to the sample size. However, the recent literature advocates for the use of hybrid methods to improve on the accuracy of price forecasting. The use of the best of each technique in a hybrid framework has been enhanced by combining the soft-computing or econometric method or both (Fan et al. 2008b; Xiong et al. 2013). The reader is referred to the excellent review by Zhang et al. (2015) for a more complete assessment of past research on oil price forecasting.

The motivation behind the present work was to exploit the potential of Bayesian network (BN) theory, in the context of crude price prediction, by constructing a network over decomposed price components. As such, the present article contributes to the existing literature in this field by proposing a novel hybrid method within a Bayesian network framework. In addition, this article reports results on other devised hybrid methods, using Random Forest (RF), Markov Chain Monte Carlo (MCMC), NNET and SVM. The rest of the article is organized as follows. The next section will detail the methods being used. A description of the results is provided in the third section, which will be followed by some concluding remarks.


The hybrid methodology followed in this article takes advantage of several concepts, which are introduced in this section.

Bayesian network

A Bayesian network is an implementation of a graphical model, in which nodes represent (random) variables and arrows represent probabilistic dependencies between the nodes (Korb and Nicholson 2004). The BN’s graphical structure is a directed acyclic graph (DAG) that enables estimation of the joint probability distribution. For each variable, DAG defines a factorization of the joint probability distribution into a set of local probability distributions, where the form of factorization is given by the BN’s Markov property, assuming a variable to be solely dependent on its parents (Scutari 2010). In this sake, the BN methodology initially seeks to find a DAG structure amongst the variables being considered. The two classifications of the BN-structure-learning process either treat the issue by analyzing the probabilistic relationships supervised by the Markov property of Bayesian networks with conditional independence tests and subsequently constructing a graph that satisfies the corresponding d-separation statements (constraint-based algorithms), or by assigning a score to each BN candidate and maximizing it with a heuristic algorithm (score-based algorithms) (Scutari 2010).

By taking advantage of the fundamental properties of the Bayesian networks, approximate inference (on an unknown value) is attainable. This approach should be able to avoid the curse of dimensionality due to its usage of local distributions (Nagarajan et al. 2013). Given the established BN network structure, the stochastic simulation can be applied to generate a large number of cases from the distribution network, from which the posterior probability of a target node is estimated. In this regard, the two prominent algorithms are logic sampling (LS) and likelihood weighting (LW). The LS algorithm generates a case by selecting values for each node, weighted by the probability of the values occurring at random. The nodes are traversed from the parent (root) nodes down to children (leaf) nodes. As a consequence, at each step, the weighting probability is either the prior or the Conditional Probability Table entry for the sampled parent values. A representation of all the nodes in the BN is created later on, once the full structure is visited. The collection of instantiation data enables estimation of the posterior probability for node X given evidence E (Appendix 1). The LW algorithm is similar to the former with a slight modification: adding the fractional likelihood of the evidence combination to the run count, instead of one (Appendix 2).

Empirical mode decomposition (EMD)

As a signal develops over time, a time-series may possess several temporal features. As such, exploring its characteristic behavior at different time scales should be informative. The Empirical Mode Decomposition (EMD) (Huang et al. 1998) separates the original signal into two parts: a fast-varying symmetric oscillation and a slow-varying local mean. The former constitutes the Intrinsic Mode Functions (IMF) of the original signal while the latter captures its residue. A repetitive sifting process is then implemented in order to extract the residue and the IMFs (Appendix 3). The process terminates once no more oscillations can be separated from the last residue. The EMD method has a stronger local representative capability compared to the wavelet transform and is more effective in processing non-linear or non-stationary signals (Hong 2011). In this work, the EMD implementation was made using the EMD package (Kim and Oh 2009; Kim and Oh 2014).

Quantile regression with lasso penalty (QRL)

Consider a problem of regression on a data set {(xi, yi); 1 ≤ i ≤ n; xi, yip} of size N, with predictors x and response values y. The conditional ξth quantile function, f(x) ξ, is defined such that P(Y ≤ f ξ (X) X = x) = ξ, for 0 < ξ < 1. Additionally, the absolute loss function, (r)ρξ, can be defined as (Koenker and Bassett 1978; Wu and Liu 2009):

$$ {\rho}_{\xi }(r)=\Big\{{{}_{-\left(1-\xi \right)r}^{\xi r}}_{otherwise}^{r>0} $$

The ξth conditional quantile function can be obtained by minimizing the following (Koenker 2004):

$$ \underset{f_{\xi }}{\min}\sum \limits_{i=1}^n{\rho}_{\xi}\left({y}_i-{f}_{\xi}\left({x}_i\right)\right)+\lambda \Xi \left({f}_{\xi}\right) $$

, where λ ≥ 0 is a regularization parameter and () ξ Ξ f, is a roughness penalty of the ξ f function. Assuming the conditional quantile function to be a linear function of the regressor x – as it is in the case of linear quantile regression – the function can be written as fξ(x) = xTβξ with βξ = (βξ, 1, βξ, 2, …, βξ, p)T. There are several recommendations for the penalty function in eq. 2. The Least Absolute Shrinkage and Selection Operator (Lasso) (Tibshirani 1996; Tibshirani 2011) treats this minimization case by constraining the L1-norm of the coefficients. In other words, the (classical) Lasso formulation considers a form of \( \sum \limits_{i=1}^p\left|{\beta}_{\xi, i}\right| \) as the penalty function. Alternatively, the functional form considered for the penalty function may account for an L2 norm constraint, the Ridge Regression method. The solution should yield a set of {β} regression parameters for the problem of interest. The rqPen package (Sherwood and Maidman 2016) was used for the QRL implementation in this work.

Bayesian lasso (BLasso) and Bayesian ridge regression (BRR)

Recall the original problem of finding the regression parameters β for the model in eq. 3:

$$ y=\mu {1}_n+ X\beta +\varepsilon $$

where y is the n × 1 vector of responses, μ is the overall mean, X is the n × p matrix of standardized regressors and ε is the n × 1 vector of independent and identically distributes normal errors with mean 0 and unknown variance 2σ. Lasso achieves a solution for β by minimizing eq. 4 through L1-penalized least squares. As such, the method should have a Bayesian interpretation, viewing the lasso estimate as the mode of the posterior distribution of β (Hans 2009).

$$ {\displaystyle \begin{array}{c}\underset{\beta }{\min }{\left(\tilde{y}- X\beta \right)}^T\left(\tilde{y}- X\beta \right)+\lambda \sum \limits_{i=1}^p\left|{\beta}_i\right|\\ {}\tilde{y}=y-\overline{y}{1}_n\end{array}} $$

Assuming a conditional Laplace prior on β, \( \pi \Big(\beta \left|{\sigma}^2\Big)=\prod \limits_{i=1}^p\frac{\lambda }{2\sqrt{\sigma^2}}{e}^{-\lambda \left|{\beta}_i\right|/\sqrt{\sigma^2}}\right. \) and a scale invariant prior on σ2, \( \pi \left({\sigma}^2\right)=\frac{1}{\sigma^2} \) a hierarchical representation of the full model is suggested as (Park and Casella 2008):

$$ {\displaystyle \begin{array}{c}y\left|\mu, X,\beta, {\sigma}^2\sim {N}_n\left(\mu {1}_n+ X\beta, {\sigma}^2{I}_n\right)\right.\\ {}\beta \left|{\sigma}^2,{\tau}_1^2,{\tau}_2^2,\dots, {\tau}_p^2\sim {N}_p\left({0}_p,{\sigma}^2{D}_{\tau}\right)\right.\\ {}{D}_{\tau }=\mathit{\operatorname{diag}}\left({\tau}_1^2,\dots, {\tau}_p^2\right)\\ {}{\sigma}^2,{\tau}_1^2,\dots {\tau}_p^2\sim \pi \left({\sigma}^2\right)d{\sigma}^2\prod \limits_{i=1}^p\frac{\lambda^2}{2}{e}^{-{\lambda}^2{\tau}_i^2/2}d{\tau}_i^2\end{array}} $$

Consequently, a basis is formed for an efficient Gibbs sampler from the Bayesian posterior distribution, updating each parameter one at a time conditioned on all other parameters, with the block updating of the regression parameters. The Bayesian concept can be also extended to the ridge regression—the Bayesian Ridge Regression (BRR) method—with an altered formulation for the priors. The reader is, however, referred to the seminal work of Park and Casella (2008) for comprehensive details of the techniques. The monomvn package (Gramacy 2016) was used to implement the BLasso/BRR methods in this work.

Markov chain Monte Carlo (MCMC)

Based on an assumption of a multivariate Gaussian () NK prior on β, and an inverse Gamma (IG) prior on the conditional error variance ε (eq. 6), the MCMC method uses Gibbs sampling to evaluate the posterior distribution of a linear regression model, enabling Bayesian inference on the regression parameters:

$$ {\displaystyle \begin{array}{c}y= X\beta +\varepsilon \\ {}\varepsilon \sim N\left(0,{\sigma}^2\right)\\ {}\beta \sim {N}_K\left({b}_0,{B}_0^{-1}\right)\\ {}{\sigma}^2\sim IG\left(\frac{c_0}{2},\frac{d_0}{2}\right)\end{array}} $$

In this work, the parameters used during MCMC implementation were (0.001, 0.001) c0 = d0 = for shape factor/scale parameter of the inverse gamma prior on σ 2, and (0, 0) b0 = B0 = for the mean/precision of the prior on β, respectively. The latter choice corresponds to a case of putting an improper uniform prior on β. A comprehensive treatment of the MCMC method can be found in Robert and Casella (2004). The MCMC implementation was made using the R Language package, MCMCpack (Martin et al. 2011).

Random Forest (RF)

Developed upon the seminal work of Breiman (2001), the Random Forest (RF) method is an extension of classification and regression trees with a modified leaning algorithm, that is, selecting a random subset of the features at each candidate split during the learning process. The algorithm exploits trees that use a subset of the observations through bootstrapping techniques. For each tree grown on a bootstrap sample, the error rate for observations left out of the bootstrap sample is monitored as the out-of-bag (OOB) error rate, the accuracy of which indicates the RF predictor accuracy. The Random Forest algorithm seeks to improve on bagging by de-correlating the trees, implementing random feature selection at each node for the set of splitting variables (Meyer et al. 2003). As such, the RF algorithm works with two main input parameters: the number of trees and the number of variables. However, an in-depth description of the method can be found in other useful literature (Breiman 2001). For the development of results presented herein, the number of trees to grow was set to 500, in accordance with the large-value recommendation in the literature (Breiman 2001; Micheletti et al. 2014). The choice for the number of trees, however, was rendered after conducting a series of runs over a grid in the range of [100–1100] (for the number of trees) and [10–32] (for the number of variables randomly sampled) to select the optimum values with minimum mean squared residuals. This resulted in the number of variables randomly sampled as candidate at each split to be set to a 32, corresponding to the maximum number of variables available. The RF implementation was accomplished using the randomForest package (Liaw and Wiener 2002).

The proposed hybrid forecasting methodology

The hybrid methodology proposed herein exploits the characteristics of the IMFs as its mainstream. In other words, the predicted crude oil price at any future point in time is assessed based on the summation of the corresponding IMFs and the residue. Hence, forecasts of the IMFs/residue are required in the time step(s) ahead. The regression forecast is attempted based on the two types of regressors, namely internal and external, for each IMF/residue. The internal regressors were considered to be those previous values of an IMF/residue, to which the current value depends, which is determined through the Partial Autocorrelation of the decomposed signal, whereas the external regressors were considered to be the technical indicators, the volatility index (VIX), and the implied oil volatility index (OVX). In this regard, the technical indicators taken into account have been the Aroon indicator (aroon), the Commodity Channel Index (CCI), the Double Exponential Moving Average (DEMA), the Exponential Moving Average (EMA), the Moving Average Convergence/Divergence (MACD), the Relative Strength Index (RSI), the Simple Moving Average (SMA), the Traders Dynamic Index (TDI) and the Triple Exponentially Smoothed Moving Average (TRIX).

The proposed methodology is hybrid in two different ways: classification and regression. The initial classification step involves the determination of the set of significant regressors, from the pool of regressors described using BN (constrained/scored), QRL, BLasso and as its methods, where the criterion of significance differs for each method. In the BLasso/BRR classification, for instance, a regressor is considered significant when the estimated posterior probability of the individual component’s regression coefficient is returned as nonzero. In the BN scenario, on the other hand, significance is recognized in the strength of the corresponding arch—an arch strength coefficient of equal to or less than 0.05 when a conditional independence test is applied (constrained BN), or a strength coefficient of less than zero threshold, when network scores are applied (scored BN) (Scutari 2010). The final set of significant regressors is selected as the whole set of significant parameters, determined by the BN-QRL-BLasso-BRR methods, for each IMF/residue.

The second regression step, being hybrid in essence, involves predicting the IMFs/residue from their corresponding regressors, in the time steps ahead. For this reason, several strategies were devised and later tested for efficiency (Table 1). The first five methods take advantage of the presence of time-varying feature in the decomposed IMF signals. Under such circumstances, the GARCH model is used to forecast the time-varying IMF components, while the rest of the IMFs as well as the residue are predicted by the conjugate model listed. In the second five strategies, the IMFs as well as the residue are computed by a candidate model, in the time steps ahead, regardless of the time-varying aspect of some of the IMFs. The procedures of the proposed hybrid method can be summarized as follows:

  1. (1)

    Apply the EMD method to the original crude price series. Decompose the series into its IMFs and the residue.

  2. (2)

    Extract the significant regressors for each IMF and residue by taking the whole set of significant regressors determined by the BN-QRL-BLasso-BRR methods.

  3. (3)

    Under Schemes 1–5 in Table 1, if the corresponding IMF presents the feature of time variation, use the GARCH model to predict its future value. Use the conjugate model in the scheme to predict the future values of other IMFs and the residue based on the regressors determined in stage 2.

  4. (4)

    Under Schemes 6–10 in Table 1, use the candidate model prescribed to forecast the future values of the IMFs and residue, based on the regressors determined in stage 2.

  5. (5)

    Construct the final forecasted crude oil price by summing the future values IMFs and the residue.

Table 1 Different predictive strategies tested

The general specification of the mean/variance in the GARCH model considered in Schemes 1–5 of Table 1 is as follows, respectively:

$$ {\displaystyle \begin{array}{c}{x}_{oil,t}={\eta}_1+{\eta}_2{x}_{oil,t-1}+{\eta}_3{x}_{oil,t-2}+\dots +{\eta}_{n+1}{x}_{oil,t-n}+{\eta}_{n+2}{x}_{OVX,t-1}+{\eta}_{n+3}{x}_{VIX,t-1}+{\eta}_{n+4}{x}_{aroon,t-1}+{\eta}_{n+5}{x}_{CCI,t-1}\\ {}+{\eta}_{n+6}{x}_{DEMA,t-1}+{\eta}_{n+7}{x}_{EMA,t-1}+{\eta}_{n+8}{x}_{MACD,t-1}+{\eta}_{n+9}{x}_{RSI,t-1}+{\eta}_{n+10}{x}_{SMA,t-1}+{\eta}_{n+11}{x}_{TDI,t-1}+{\eta}_{n+12}{x}_{TRIX,t-1}+{\varepsilon}_t\end{array}} $$
$$ {\sigma}_{oil,t}^2={\eta}_{n+13}+{\eta}_{n+14}{\sigma}_{oil,t-1}^2+{\eta}_{n+15}{\varepsilon}_{t-1}^2 $$

,where x i t, refer to the price/value of i at time t, and n refers to the number of previous time steps to which the current oil price data is correlated. The final form of the GARCH specification for the time-varying IMFs may adopt a truncated version in mean, though, as they differ in their type/number of significant regressors, to be incorporated into the equation for mean. The GARCH predictions were made by the rugarch package (Ghalanos 2015). The accuracy of hybrid forecasting methods was evaluated by several statistical criteria, namely, the mean absolute error (MAE), the root mean square error (RMSE) and the mean absolute percentage error (MAPE).

Data and results

The price data of oil/OVX/VIX were acquired through the Quandl package (McTaggart et al. 2016). As for the crude price, the data related to three benchmarks were collected: West Texas Intermediate (WTI), Brent crude (BRENT), and the OPEC Reference Basket (ORB).

A number of seven distinct IMFs (IMF.1, IMF.2, …, IMF.7) and one residue (RES) were detected for WTI (Fig. 1), while for BRENT/ORB, eight IMFs were detected. In this process, the S stoppage rule (Huang and Wu 2008) was considered, as for the stopping rule of the sifting process. Tables 2, 3 and 4 report the statistics measured on each IMF/residue in the time span between July 6, 2007 and February 25, 2019. The subscripts in parenthesis indicate the corresponding p-values. The IMFs with leptokurtic characteristic (Kurtosis > 3) were later used in Methods 1–5 (Table 1) as input into the GARCH model. The statistics were measured using the tseries package (Trapletti and Hornik 2015).

Fig. 1

The Intrinsic Mode Functions- IMF.1 (a), IMF.2 (b), IMF.3 (c), IMF.4 (d), IMF.5 (e), IMF.6 (f), IMF.7 (g) - and the residue (h), decomposed from WTI price data by the EMD method

Table 2 Descriptive statistics of the IMFs/residue of WTI, in the period between 06-July-2007 and 25-February-2019
Table 3 Descriptive statistics of the IMFs/residue of BRENT, in the period between 06-July-2007 and 25-February-2019
Table 4 Descriptive statistics of the IMFs/residue of ORB, in the period between 06-July-2007 and 25-February-2019

The technical indicators were computed using the TTR package (Ulrich 2016). However, it should be noted that because the oil price data used contained the closing price values, the CCI numbers obtained herein essentially receive an altered meaning to their original definition. Table 5 lists the set of significant external regressors, separately detected by the BN-QRL-BLasso-BRR methods, for each IMF/residue of WTI. Figure 2 provides a graphical representation of the extracted Bayesian network of significant/insignificant previous-time external regressors of the IMFs/residue of WTI, obtained through the constraint-based BN concept. The Rgraphviz package (Hansen et al. 2008) was used to plot the graph. The final set of previous-time external regressors considered for each IMF/residue was, however, taken as the whole of the significant detected parameters (Tables 6, 7 and 8). Detection of the significant internal regressors was made by following the standard procedure of considering the partial autocorrelation of the decomposed signal (Fig. 3). The DAG graphs (Fig. 3) are important in the sense that they reveal the influencing parameters on each intrinsic mode function. The CCI, SMA and OVX parameters are not directly connected; the DAG assumes that there is no possibility to revisit a given vertex after starting from that same vertex, following a consistently directed sequence of edges. A connection between these nodes (vertices) would violate this rule for the data involved, which is the reason why they appear unconnected in the reported DAGs. Moreover, the CCI, SMA and OVX can be considered to be the parents for most of intrinsic mode functions as well as the residue.

Table 5 The set of significant regressors detected for each IMF/residue of WTI, with the BN-QRL-BLasso-BRR techniques
Fig. 2

The Bayesian Network of significant (solid lines), insignificant (dashed lines) of external regressors for the Intrinsic Mode Functions- IMF.1 (a), IMF.2 (b), IMF.3 (c), IMF.4 (d), IMF.5 (e), IMF.6 (f), IMF.7 (g) - and the residue (h) of WTI, extracted through the constrained-based BN method

Table 6 The final set of external regressors for each IMF/residue of WTI
Table 7 The final set of external regressors for each IMF/residue of BRENT
Table 8 The final set of external regressors for each IMF/residue of ORB
Fig. 3

The partial autocorrelation of the Intrinsic Mode Functions- IMF.1 (a), IMF.2 (b), IMF.3 (c), IMF.4 (d), IMF.5 (e), IMF.6 (f), IMF.7 (g) - and the residue (h) of WTI

The hybrid strategy optimization (Ghalanos 2015) was used within the GARCH implementation in Methods 1–5. This ensures that a number of non-linear solvers are called in a sequence in the case that the initial optimization fails. For the SVM implementation (Methods 2 and 7 in Table 1), a grid-search was initially conducted over the parameter ranges so as to calibration the SVM model (Meyer et al. 2015). This was followed by a kernel-based SVM regression, where the hyperparameters of the kernel were taken as those obtained from the calibration stage. The Laplacian kernel was used within the SVM regression with bound constraint (Karatzoglou et al. 2004). For the NNET predictions (Methods 4 and 9), the k-nearest neighbor method was used without any preprocessing of the predictor data (Kuhn et al. 2016). As for the MCMC (Methods 3 and 8), a number of 1,000 burn-in iterations was elapsed, followed by 10,000 Metropolis iterations for the sampler. Also, a number of 1,000 MCMC samples were collected for the BLasso/BRR outputs. The initial lasso penalty parameter was taken as one and the Rao-Blackwellized samples were used for σ2 (Gramacy 2016). The selection of the model for the columns of the design matrix for regression parameters in BLasso/BRR was made using the Reverse-Jump MCMC (Gramacy 2016).

Implementation of the graphical structure-learning of the Bayesian networks was attempted using the bnlearn package (Scutari 2017). Both types of constraint/score-based algorithms were tested in this work. For the constraint-based type, the Monte Carlo permutation test was used for the conditional independence test. While in the score-based case, a score equivalent Gaussian posterior density criterion was applied. For a BN inference, predictions were obtained by applying the LW algorithm and extracting the expected value of the conditional distribution of 500 simulation results. All the available nodes in the structure were taken as evidence in that situation except the node related to the variable being predicted.

The crude price forecasting was attempted from periods of time ten days ahead. To test the performance of the methods, three random training sets were used with a common beginning date of July 6, 2007 and ending dates of January 3, 2017, March 27, 2017 and February 8, 2019, respectively. In each case, the hybrid methods were used to predict the crude prices for the ten out-of-sample days immediately ahead. The average of the errors incurred is reported in Tables 9, 10 and 11. According to the results, Method-10 shows conspicuously better performance when compared with the other methods, in all three statistics measured (MAE, RMSE and MAPE). The superior performance of Method-10 is not merely bounded to a single market, rather it is demonstrated over the three price types (WTI, BRENT and ORB). Furthermore, the predictive accuracy of Method-10 is compared by using a Diebold-Mariano test against other techniques. Table 12 lists the Diebold-Mariano statistics for the 10-day forecast of WTI/BRENT/ORB, assessing the alternative hypothesis that Method-10 is more accurate than the method of choice, clearly demonstrating the superior accuracy of the proposed hybrid technique. A close inspection of the results also indicates that, overall, Methods 6–10 achieve a better predictive success than Methods 1–5, which incorporate the GARCH model. However, this should not rule out the application of GARCH in hybrid price forecasting models.

Table 9 The average errors of the 10-days-ahead forecasts of WTI
Table 10 The average errors of the 10-days-ahead forecasts of BRENT
Table 11 The average errors of the 10-days-ahead forecasts of ORB
Table 12 The Diebold-Mariano statistics for 10-day forecast of WTI/BRENT/ORB, for the alternative hypothesis that Method-10 is better in terms of accuracy versus the method of choice

The extracted Bayesian networks indicate that both OVX and VIX are influential on different layers of IMF or the residue, which accounts for their impact on the value of future crude prices. In addition, the established BN structure shows the previous-time technical indicators to affect different layers of IMF and the residue of the decomposed price signal.

The proposed hybrid methodology was chosen for short-term prediction of crude prices, owing to the short-term viability of the regressors employed. The method, however, deserves further investigation and merits being tested for its long-term forecasting capability, for example incorporating regressors with longer life spans.


The performance of the hybrid Bayesian network proposition was outstanding compared to the other devised hybrid models, in all of the three crude price types (WTI, BRENT and ORB) and against all of the three statistical benchmarks (MAE, RMSE and MAPE). The BN demonstrated that the volatility indices (OVX, VIX) are influential on different decomposed signals of the crude price, affecting the level-ahead price values. The predictive power of the hybrid methods adopting GARCH was shown to be inferior to the other methods, which apply regressions to all of the layers of the decomposed signal for crude price forecasting. Since the proposed hybrid method makes use of regressors with short-term life spans (i.e., technical indicators, OVX, VIX and past price values), the method remains a valid option for short-term forecasting. The question of its capability in handling long-term price forecasts is yet to be answered by the future research using parameters with longer-term viability.

Availability of data and materials

Not applicable.



Aroon indicator


Bayesian Lasso


Bayesian Network


Brent crude


Bayesian Ridge Regression


Commodity Channel Index


Directed acyclic graph


Double Exponential Moving Average


Exponential Moving Average


Empirical Mode Decomposition


Generalized autoregressive conditional heteroskedasticity


Intrinsic Mode Functions


Implied volatility


Kernel Support Vector Machine


Logic sampling


Likelihood weighting


Moving Average Convergence/Divergence


Mean absolute error


Mean absolute percentage error


Markov Chain Monte Carlo


Neural Networks




OPEC Reference Basket


Implied Oil Volatility Index


Lasso penalty


Random Forest


Root Mean Square Error


Relative Strength Index


Simple Moving Average


Stochastic volatility


Support Vector Machine


Traders Dynamic Index


Triple Exponentially Smoothed Moving Average


Volatility Index


West Texas Intermediate


  1. Agnolucci P (2009) Volatility in crude oil futures: a comparison of the predictive ability of GARCH and implied volatility models. Energy Econ 31:316–321.

    Article  Google Scholar 

  2. Arouri MEH, Lahiani A, Lévy A, Nguyen DK (2012) Forecasting the conditional volatility of oil spot and futures prices with structural breaks and long memory models. Energy Econ 34:283–293.

    Article  Google Scholar 

  3. Azadeh A, Moghaddam M, Khakzad M, Ebrahimipour V (2012) A flexible neural network-fuzzy mathematical programming algorithm for improvement of oil price estimation and forecasting. Comput Ind Eng 62:421–430.

    Article  Google Scholar 

  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32.

    Article  Google Scholar 

  5. Cheong CW (2009) Modeling and forecasting crude oil markets using ARCH-type models. Energy Policy 37:2346–2355.

    Article  Google Scholar 

  6. Fan Y, Liang Q, Wei YM (2008b) A generalized pattern matching approach for multistep prediction of crude oil price. Energy Econ 30(3):889–904.

    Article  Google Scholar 

  7. Fan Y, Zhang YJ, Tsai H-T, Wei YM (2008a) Estimating ‘value at risk’ of crude oil price and its spillover effect using the GED approach. Energy Econ 30(6):3156–3171.

    Article  Google Scholar 

  8. Ghaffari A, Zare S (2009) A novel algorithm for prediction of crude oil price variation based on soft computing. Energy Econ 31:531–536.

    Article  Google Scholar 

  9. Ghalanos A. 2015. rugarch: Univariate GARCH models. R package version 1.3-6.

  10. Gramacy R.B. 2016. monomvn: Estimation for Multivariate Normal and Student-t Data with Monotone Missingness. R package version 1.9-6. Accessed 25 Feb 2019

  11. Guo XP, Li DC, Zhang AH (2012) Improved support vector machine oil price forecast model based on genetic algorithm optimization parameters. AASRI Procedia 1:525–530.

    Article  Google Scholar 

  12. Hans C (2009) Bayesian lasso regression. Biometrika 96(4):835–845.

    Article  Google Scholar 

  13. Hansen KD, Gentry J, Long L, Gentleman R, Falcon S, Hahne F, Sarkar D (2008) Rgraphviz: provides plotting capabilities for R graph objects. R package version 2:14.0

    Google Scholar 

  14. Hong L (2011) Decomposition and forecast for financial time series with high-frequency based on empirical mode decomposition. Energy Procedia 5:1333–1340.

    Article  Google Scholar 

  15. Hou A, Suardi S (2012) A nonparametric GARCH model of crude oil price return volatility. Energy Econ 34:618–626.

    Article  Google Scholar 

  16. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zhang Q, Yen N, Tung CC, Liu HH (1998) The empirical mode decomposition and Hilbert Spectrum for nonlinear and nonstationary time series analysis. Proc R Soc London Ser A 454:903–995.

    Article  Google Scholar 

  17. Huang NE, Wu Z (2008) A review on Hilbert-Huang transform: method and its applications to geophysical studies. Rev Geophys 46:RG2006.

    Article  Google Scholar 

  18. Kang SH, Kang SM, Yoon SM (2009) Forecasting volatility of crude oil markets. Energy Econ 31:119–125.

    Article  Google Scholar 

  19. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) Kernlab – an S4 package for kernel methods in R. J Stat Softw 11(9):1–20

    Article  Google Scholar 

  20. Kim D, Oh H (2009) A package for empirical mode decomposition and Hilbert Spectrum. The R Journal 1:30–46

    Article  Google Scholar 

  21. Kim D., Oh H., 2014. EMD: Empirical Mode Decomposition and Hilbert Spectral Analysis. R package version 1.5.7.

  22. Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91:74–89.

    Article  Google Scholar 

  23. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50.

    Article  Google Scholar 

  24. Korb K, Nicholson A (2004) Bayesian artificial intelligence, CRC Press, London

  25. Kuhn M., Wing J., Weston S., Williams A., Keefer C., Engelhardt A., Cooper T., Mayer Z., Kenkel B., Benesty M., Lescarbeau R., Ziem A., Scrucca L., Tang Y., Candan C. 2016. Caret: Classification and Regression Training. R package version 6.0-68. URL Accessed 25 Feb 2019

  26. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  27. Martin AD, Quinn KM, Park J (2011) MCMCpack: Markov chain Monte Carlo in R. J Stat Softw 42(9):1–21

    Article  Google Scholar 

  28. McTaggart R., Daroczi G., Leung C. 2016. Quandl: API Wrapper for R package version 2.8.0. http://CRAN.R-project.orh/package=Quandl. Accessed 25 Feb 2019

  29. Meyer D., Dimitriadou E., Hornik K., Weingessel A., Leisch F. 2015. e1071: Misc Functions of the Department of Statictics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6-7. Accessed 25 Feb 2019

  30. Meyer D, Leisch F, Hornik K (2003) The support vector machine under test. Neurocomputing 55(1–2):169–186.

    Article  Google Scholar 

  31. Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A, Jaboyedoff M, Kanevski M (2014) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46:33–57.

    Article  Google Scholar 

  32. Mohammadi H, Su L (2010) International evidence on crude oil price dynamics: applications of ARIMA-GARCH models. Energy Econ 32:1001–1008.

    Article  Google Scholar 

  33. Movagharnejad K, Mehdizadeh B, Banihashemi M, Kordkheili MS (2011) Forecasting the differences between various commercial oil prices in the Persian Gulf region by neural network. Energy 36:3979–3984.

    Article  Google Scholar 

  34. Nagarajan R, Scutari M, Lèbre S (2013) Bayesian Networks in R with Applications in Systems Biology. Springer-Verlag, New York

    Google Scholar 

  35. Narayan PK, Narayan S (2007) Modelling oil price volatility. Energy Policy 35:6549–6553.

    Article  Google Scholar 

  36. Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686.

    Article  Google Scholar 

  37. Robert C, Casella G (2004) Monte Carlo statistical methods, 2nd edn. Springer-Verlag, New York

    Google Scholar 

  38. Sadorsky P (2006) Modeling and forecasting petroleum futures volatility. Energy Econ 28:467–488.

    Article  Google Scholar 

  39. Scutari M (2010) Learning Bayesian networks with the bnlearn R package. J Stat Softw 35(3):1–22

    Article  Google Scholar 

  40. Scutari M (2017) Bayesian network constraint-based structure learning algorithms: parallel and optimized implementations in the bnlearn R package. J Stat Softw 77(2):1–20.

    Article  Google Scholar 

  41. Sherwood B., Maidman A. 2016. rqPen: Penalized Quantile Regression. R package version 1.4. Accessed 25 Feb 2019

  42. Shin H, Hou T, Park K, Park CK, Choi S (2013) Prediction of movement direction in crude oil prices based on semi-supervised learning. Decis Support Syst 55:348–358.

    Article  Google Scholar 

  43. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288.

    Article  Google Scholar 

  44. Tibshirani R. 2011. Regression shrinkage and selection via the lasso: a retrospective. 73 (3) 273–282. DOI:

    Article  Google Scholar 

  45. Trapletti A., Hornik K. 2015. tseries: Time Series Analysis and Computational Finance. R package version 0.10–34

  46. Ulrich J. 2016. TTR: Technical Trading Rules. R package version 0.23-1. Accessed 25 Feb 2019

  47. Wang J, Pan HP, Liu FJ (2012) Forecasting crude oil price and stock price by jump stochastic time effective neural network model. J Appl Math 2012:1–15.

    Article  Google Scholar 

  48. Wei Y, Wang YD, Huang DS (2010) Forecasting crude oil market volatility: further evidence using GARCH-class models. Energy Econ 32:1477–1484.

    Article  Google Scholar 

  49. Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19:801–817

    Google Scholar 

  50. Xiong T, Bao YK, Hu ZY (2013) Beyond one-step-ahead forecasting: evaluation of alternative multi-step-ahead forecasting models for crude oil prices. Energy Econ 40:405–415.

    Article  Google Scholar 

  51. Yousefi S, Weinreich I, Reinarz D (2005) Wavelet-based prediction of oil prices. Chaos, Solitons Fractals 25(2):265–275.

    Article  Google Scholar 

  52. Yu L, Wang S, Lai KK (2008) Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Econ 30:2623–2635.

    Article  Google Scholar 

  53. Zhang J, Zhang Y, Zhang L (2015) A novel hybrid method for crude oil price forecasting. Energy Econ 49:649–659.

    Article  Google Scholar 

  54. Zhang X, Lai KK, Wang SY (2008) A new approach for crude oil price analysis based on empirical mode decomposition. Energy Econ 30:905–918.

    Article  Google Scholar 

Download references


Not applicable.


Not applicable.

Author information




The work was solely done by the corresponding author, Babak Fazelabdolabadi. The author read and approved the final manuscript.

Corresponding author

Correspondence to Babak Fazelabdolabadi.

Ethics declarations

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

The Logic sampling algorithm

Consider an established Bayesian Network. Assume X to be node in this BN structure, and E as a given evidence. The Logic sampling algorithm for estimation of the posterior probability of node X given evidence E = e, is computed by the following procedure (Korb and Nicholson 2004):


Appendix 2

The Likelihood weighing algorithm

Consider an established Bayesian Network. Assume X to be node in this BN structure, and E as a given evidence. The Likelihood weighing algorithm for estimation of the posterior probability of node X given evidence E = e, is computed by the following procedure (Korb and Nicholson 2004):


Appendix 3

The Empirical Mode Decomposition algorithm

Consider a time series, f(t), with t referring to time. The EMD fractionation of the original data is made following the below steps (Huang et al. 1998):


Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fazelabdolabadi, B. A hybrid Bayesian-network proposition for forecasting the crude oil price. Financ Innov 5, 30 (2019).

Download citation


  • Bayesian networks
  • Random Forest
  • Markov chain Monte Carlo
  • Support vector machine