 Research
 Open access
 Published:
Deep learning systems for forecasting the prices of crude oil and precious metals
Financial Innovation volume 10, Article number: 111 (2024)
Abstract
Commodity markets, such as crude oil and precious metals, play a strategic role in the economic development of nations, with crude oil prices influencing geopolitical relations and the global economy. Moreover, gold and silver are argued to hedge the stock and cryptocurrency markets during market downsides. Therefore, accurate forecasting of crude oil and precious metals prices is critical. Nevertheless, due to the nonlinear nature, substantial fluctuations, and irregular cycles of crude oil and precious metals, predicting their prices is a challenging task. Our study contributes to the commodity market price forecasting literature by implementing and comparing advanced deeplearning models. We address this gap by including silver alongside gold in our analysis, offering a more comprehensive understanding of the precious metal markets. This research expands existing knowledge and provides valuable insights into predicting commodity prices. In this study, we implemented 16 deep and machinelearning models to forecast the daily price of the West Texas Intermediate (WTI), Brent, gold, and silver markets. The employed deeplearning models are long shortterm memory (LSTM), BiLSTM, gated recurrent unit (GRU), bidirectional gated recurrent units (BiGRU), T2VBiLSTM, T2VBiGRU, convolutional neural networks (CNN), CNNBiLSTM, CNNBiGRU, temporal convolutional network (TCN), TCNBiLSTM, and TCNBiGRU. We compared the forecasting performance of deeplearning models with the baseline random forest, LightGBM, support vector regression, and knearest neighborhood models using mean absolute error (MAE), mean absolute percentage error, and root mean squared error as evaluation criteria. By considering different sliding window lengths, we examine the forecasting performance of our models. Our results reveal that the TCN model outperforms the others for WTI, Brent, and silver, achieving the lowest MAE values of 1.444, 1.295, and 0.346, respectively. The BiGRU model performs best for gold, with an MAE of 15.188 using a 30day input sequence. Furthermore, LightGBM exhibits comparable performance to TCN and is the bestperforming machinelearning model overall. These findings are critical for investors, policymakers, mining companies, and governmental agencies to effectively anticipate market trends, mitigate risk, manage uncertainty, and make timely decisions and strategies regarding crude oil, gold, and silver markets.
Introduction
Nonrenewable commodities usually mined in certain countries can strongly impact their economies, policies, currencies, and international or political issues. Energy and precious metals markets, among other commodities, are wellknown alternatives to stock markets (Pullen et al. 2014; Hussain Shahzad et al. 2017; Akbar et al. 2019; Adekoya et al. 2022; Phan et al. 2016; Sarwar et al. 2019). Their prices are critical indicators of economic health and crucial determinants for financial planning and decision making. In this regard, understanding the dynamics of such markets and forecasting their evolutions is crucial for portfolio optimization and management. Crude oil, a crucial energy commodity, is pivotal in global macroeconomics and influences the decisions made by policymakers like governments and central banks. Fluctuations in crude oil prices have profound implications for a country’s political and economic security; therefore, accurate crude oil price forecasting is imperative. Crude oil market shocks in April 2020 and their impacts have increased interest in understanding oil price dynamics (Wang et al. 2021; Murshed and Tanha 2021; Balcilar et al. 2021; Zhang et al. 2022a, b; Enwereuzoh et al. 2021). Conversely, gold is important for investment portfolio diversification and hedging (ben Khelifa et al. 2021; Reboredo 2013; Baek 2019). Gold contributes a large portion of the commodity reserves of major economies. As of September 2022, the official United States (US) gold reserve was 8133.47 tons, approximately 66.6% of total US reserves.^{Footnote 1}
Given these markets’ multifaceted nature, forecasting the trajectories of these commodities is crucial in financial markets, serving as an essential tool for investors, policymakers, and analysts. For investors, anticipating price movements in crude oil and precious metals provides a strategic advantage in optimizing portfolio performance and risk management. A comprehensive understanding of potential price fluctuations allows investors to make informed decisions, allocate resources optimally, and ultimately enhance their overall financial returns (Bhowmik and Wang 2020). In contrast, policymakers rely on accurate market forecasts to develop effective economic policies and mitigate the potential impact of market volatility on national economies. Fluctuations in crude oil prices, for instance, can have cascading effects on inflation, trade balances, and overall economic stability (UzoPeters et al. 2018; Xiuzhen et al. 2022; Periwal 2023). Similarly, precious metal prices often indicate broader economic sentiments and can influence monetary policies and international trade relationships.
In this context, the science of forecasting plays a pivotal role in providing foresight into future trends in crude oil and precious metal prices. Advanced analytical models (Kou et al. 2021, 2022; Li et al. 2022a, b; Lahmiri 2023a), statistical methods (Lahmiri et al. 2022; Lahmiri 2023b), machine learning (Lahmiri et al. 2023), and deeplearning algorithms (Amirifar et al. 2023; Amirshahi and Lahmiri 2023a, b; Lahmiri and Bekiros 2019, 2020, 2021) enable analysts to search through vast datasets, identify patterns, and make predictions that are invaluable for both shortterm traders and longterm investors (Abdullah Ahmed and Bin Shabri 2014; Zhao et al. 2015; Das et al. 2022; Jiang et al. 2022; Liang et al. 2023). Driven by this motivation, this study investigates forecasting methodologies within the domains of crude oil and precious metals markets to enhance the precision of price predictions.
Recent innovations in deep learning models seem promising for timeseries forecasting; however, the crude oil and precious metals forecasting literature struggles to use these models for price prediction. This study attempts to fill this gap in the forecasting literature by applying several deep and machinelearning models to predict the daily closing prices of crude oil, gold, and silver. First, the timeseries data of daily spot prices of two prominent crude oils, West Texas Intermediate (WTI) and Brent, and two precious metal markets, gold and silver, are gathered and normalized. Then, several input sequences are prepared using the sliding window method with four different window lengths. Next, the dataset is split into training, validation, and test sets using a timebased splitting approach. Finally, a comprehensive set of 16 forecasting models, consisting of 12 deeplearning models, 2 baselineensemble models, and 2 baseline machinelearning models, is implemented to predict the nextday market price. The deep learning models used in the current study include long shortterm memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent units (GRU), bidirectional GRU (BiGRU), Time2Vector BiLSTM (T2VBiLSTM), Time2Vector BiGRU (T2VBiGRU), convolutional neural networks (CNN), hybrid CNNBiLSTM, hybrid CNNBiGRU, temporal convolutional networks (TCN), hybrid TCNBiLSTM, and hybrid TCNBiGRU models. Two baseline ensemble models are the random forest and LightGBM gradientboosting models, and two baseline machinelearning models are the support vector regression (SVR) and knearest neighborhood (KNN) models.
Each of the employed models has its strengths and limitations. LSTM models are a type of recurrent neural networks (RNN) that are popular for their ability to capture longterm dependencies, overcome the gradient vanishing problem, and handle variablelength sequences; however, LSTMs can be computationally expensive and prone to overfitting, requiring regularization techniques (Yu et al. 2019). GRU models, another type of RNN, have a simpler architecture, resulting in faster training and inference times; however, they may have limitations in capturing complex patterns compared with LSTM models. Bidirectional models, such as BiLSTM or BiGRU, consider both forward and backward information, making them more robust to variations in the input sequence order; however, they are computationally complex and require more memory resources (Khan et al. 2021). CNNs are effective at capturing local patterns and features within timeseries data. CNNs learn filters to detect specific temporal patterns and are translation invariant, meaning they can detect patterns regardless of their position in the input sequence; however, CNNs have limitations, such as the requirement for fixedlength inputs, limited consideration of temporal ordering, and the ability to capture longterm dependencies. Hybrid CNN–LSTM models combine the strengths of both CNNs and LSTMs, capturing spatial and temporal features. They are suitable for tasks that require capturing complex patterns in timeseries data; however, they can be less interpretable than standalone models (Gharghory 2021). TCNs are designed to capture longterm dependencies efficiently. They use dilated convolutions to capture information from several past time steps. TCNs are adaptable to different timeseries lengths without padding or truncation; however, they can be complex to design and tune and are sensitive to input scaling (Gopali et al. 2021). Ensemble machinelearning models such as random forest and LightGBM are also used in timeseries analysis. Random forest combines multiple decision trees and offers high prediction accuracy and robustness against outliers. LightGBM is an efficient gradientboosting framework that effectively handles large datasets. Both models have their accuracy and generalization strengths but cannot explicitly capture temporal dependencies (Ke et al. 2017). SVR is a flexible model that can capture linear and nonlinear relationships; it focuses on support vectors, which greatly influence the model’s decision boundary. SVR can handle highdimensional datasets and complex relationships between variables; however, the performance of SVR depends on selecting appropriate hyperparameters, and it does not explicitly model temporal dependencies. KNN is an instancebased algorithm that makes predictions based on the similarity of training instances; it requires no training phase but suffers from the curse of dimensionality and cannot capture temporal dependencies.
Our paper compares the forecasting performance of these models by mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) error functions. This paper primarily aims to answer the following questions through empirical experiments. (1) What is the best deeplearning model that can predict crude oil, gold, and silver spot prices reliably and precisely? (2) In response to the first question, does a particular model outperform other models for crude oil and precious metals prices? (3) Which input sequence length is more informative for each market’s price prediction? (4) Are hybrid models effective in forecasting crude oil, gold, and silver spot prices? (5) What conclusions about the properties of each deeplearning model can be drawn in the context of crude oil and precious metals timeseries forecasting?
The arrangement of the rest of this manuscript is as follows. “Literature review” section provides an overview of the relevant prior research and summarizes our contributions to the existing literature. “Methodology” section explains the methods and performance evaluation criteria used in this study. “Empirical analysis and results” section describes the datasets, demonstrates the results, and discusses our findings. Finally, “Conclusion” section summarizes the paper and presents some managerial implications and policy suggestions.
Literature review
Accurately forecasting financial markets is a critical guide for determining economic policies. Consequently, researchers have dedicated their efforts to developing and improving models that capture the intrinsic behavior and dynamics of financial market time series. The prediction methods used in these studies generally comprise statistical or econometrics, machine learning, and deeplearning methods. Several forecasting modeling approaches have recently been applied to crude oil and precious metals. For instance, Zhao et al. (2018) proposed a numerical vector trend forecasting method for predicting the daily spot price of Brent crude oil, outperforming traditional models such as autoregressive integrated moving average (ARIMA), SVR, and wavelet analysis models. Similarly, Szarek et al. (2020) proposed a new stochastic distribution, skewed Student’s tdistribution, for silver, copper, and gold timeseries estimation, which accounts for the timedependent parameters and nonGaussian behavior of timeseries data. Drachal (2022) employed the Bayesian symbolic regression method to address variable uncertainty in monthly crude oil price forecasting.
Due to the nonlinearity, nonstationarity, and heteroscedasticity of crude oil and precious metal markets, classical statistical forecasting models such as vector autoregressive (VAR), ARIMA, and autoregressive distributed lag (ARDL) struggle to perform well in forecasting tasks. These models make assumptions about the normality and stationarity of price data, which often do not hold for many timeseries data for commodity markets. As a result, recent studies have used machine and deeplearning models, which excel in handling nonlinear data and do not rely on the normality assumption for accurate price predictions. In the literature, three main types of deep neural networks are used for sequence modeling, and they can be applied for timeseries forecasting (Lim and Zohren 2021). These networks include (i) RNNs and their variants, such as LSTM (Hochreiter and Schmidhuber 1997) and GRUs (Cho et al. 2014), (ii) CNNs (Lecun et al. 1998) and their recent variant, TCN (Lea et al. 2016), and (iii) transformer (Vaswani et al. 2017) and its variants (Devlin et al. 2018; He et al. 2020; Liu et al. 2019).
Several studies used statistical, machine learning, and deeplearning models to account for the importance of gold price forecasting. Alameer et al. (2019) used a multilayer perceptron model with a whale optimization algorithm for gold nextmonth price forecasting. This model demonstrates a lower forecasting error than ARIMA model forecasts. Madziwa et al. (2022) employed an ARDL model to forecast annual gold prices using lagged gold prices, gold demand, and treasury bill rates as predictors. In another study, Zhang and Ci (2020) used the US Consumer Price Index, crude oil price, exchange rate, and Dow Jones Industrial Price Index in a deep belief network to predict monthly gold prices. Risse (2019) predicted gold excess returns to the riskfree rate of return using the ana SVR model. SVR finds the nonlinear relationship in the data by mapping a linear function into a highdimensional feature space. Treebased ensemble models have demonstrated promising performance in forecasting gold prices. Yuan (2023) leveraged the XGBoost (Chen and Guestrin 2016) and LightGBM (Ke et al. 2017) models for gold and bitcoin price forecasting. Furthermore, deeplearning methods have been increasingly used for gold price prediction. For instance, using association rules and the LSTM mode, Boongasame et al. (2022) predicted the price of gold. Vidal and Kristjanpoller (2020) developed a hybrid of convolutional neural networks and long and shortterm memory models (CNN–LSTM), which incorporate historical logreturn series and timeseries data in an image format to predict the volatility of gold spot prices. Likewise, various studies have used deeplearning models for crude oil price forecasting. Orojo et al. (2019) employed a multirecurrent network to forecast a onemonth ahead WTI crude oil price. Lin et al. (2022) forecasted crude oil futures prices using a BiLSTMAttentionCNN model with wavelet transform. Swamy and Lagesh (2023) explored the effectiveness of investor sentiments from Twitter in predicting the daily gold price by a wavelet analysis method and unveiled a strong correlation between Twitter sentiments and the gold price. Fang et al. (2023a, b) forecasted Brent crude oil prices using an improved slopebased method based on empirical mode decomposition (EMD) and feedforward neural network (FNN) methods.
Conversely, the literature on forecasting other precious metal markets is relatively limited. Sroka (2022) utilizes block bootstrap methods to forecast daily silver prices, while Salisu et al. (2020) tested the impact of Google Trends on forecasting the prices of four precious metal markets using an ARDL model. Zhang et al. (2022a, b) introduced a new objective function to forecast commodity markets, including silver prices. To our knowledge, there is no precedent study to forecast the silver price using machine and deeplearning models. We attempt to fill this void in the literature.
Given the ongoing improvements in natural language processing (NLP) tasks, recent studies have incorporated news text and Google Trends features into their forecasting models. These approaches leverage the valuable information in the textual data to enhance the accuracy of predictions. For example, Li et al. (2019) extracted text data from online news media and created sentiment features that were grouped by their topics using a latent Dirichlet allocation method. Their topicsentiment forecasting model shows that text features complement financial features for crude oil price forecasting. Similarly, Bai et al. (2022) constructed features from news headlines for WTI crude oil forecasting. Fang et al. (2023a, b) employed a FineBERT approach to extract sentiment information from crude oilrelated news, which was then integrated into a hybrid attentionbased BiGRU model for WTI price forecasting. Kertlly de Medeiros et al. (2022) demonstrated performance enhancement using a mixed data sampling model incorporating mixedfrequency data and a textual sentiment indicator for oil price forecasting. Salisu et al. (2020) utilized an econometric ARDL model to show that search engine data from Google Trends significantly positively affect precious metal returns. Similarly, Tang et al. (2020) considered Google Trends a useful predictor in a multivariate empirical mode decomposition method for forecasting Brent crude oil spot prices. Other EMD methods have been used by Wang et al. (2018), Qin et al. (2019), Yang et al. (2020), G. Li et al. (2022a, b), and Guo et al. (2022) in their proposed crude oil forecasting models. Liang et al. (2023) also used historical crude oil prices in a deep reinforcement learning algorithm to forecast multistep ahead WTI, Brent, and Oman prices. A recent review paper (Mohamed and Messaadia 2023) highlights that artificial neural networks and support vector machines (SVMs) are the most popular artificial intelligence techniques used to forecast crude oil prices. Collectively, these studies showcase the growing significance of advanced forecasting methods to enhance the accuracy and reliability of predictions in the crude oil and precious metal markets.
Some studies have achieved improved forecasting performances by developing ensemble models. Zhao et al. (2017) combined the advantages of stacked denoising autoencoders (SDAE) and bootstrap aggregation (bagging) techniques to model the nonlinear and complex relationships of oil price factors and to generate multiple data sets for training a set of base learners. Wang et al. (2020) proposed an ensemble of five linear and nonlinear submodes to produce the prediction intervals of crude oil spot prices while optimizing the weights of submodes using the gray wolf optimizer. Zhang et al. (2021) developed an ensemble deeplearning model for electricity price series prediction. Jiang et al. (2022) combined a decompositionensemble approach optimized by the seagull algorithm with sentiment analysis to forecast future crude oil prices. Su et al. (2022) proposed a hybrid forecasting model using SVM, extreme learning machines, XGBoost, and LSTM models to predict crude oil futures series. Sun et al. (2022) proposed a secondary decomposition–reconstruction–ensemble approach for crude oil price forecasting.
The temporal convolutional networks (TCNs) (Lea et al. 2016) are variants of CNN models that employ casual convolutions and dilations to predict sequential data with temporality and large receptive fields. A simple convolution can only look back at a fixed timing window, whereas a TCN uses dilated convolutions to achieve a large receptive field with fewer convolutional layers. TCNs capture longterm patterns using a hierarchy of temporal convolutional filters, and in that manner, they tend to outperform bidirectional LSTM models and are a magnitude faster to train. A TCN was first developed for action detection in video data settings to account for spatial and temporal input features (Lea et al. 2016). However, recently, TCNs have drawn more attention from scholars and have been applied to various timeseries data. For instance, LaraBenítez et al. (2020) utilized a TCN model to forecast electricity demand and prices in Spain. In the environmental milieu, Yan et al. (2020) predicted the El NiñoSouthern Oscillation, an index measuring the earth’s climate variability, by applying an ensemble empirical mode decomposition–TCN model. This model shows improved prediction performance compared with the LSTM model.
Considering temporal patterns in predicting timeseries data is a significant challenge for many models. Some recent studies have introduced learnable time representations to account for temporal patterns in sequential data (Xu et al. 2019, 2021; Li et al. 2017). Among these studies, Kazemi et al. (2019) introduced the Time2Vector method to represent sequential data as periodic and nonperiodic vectors that can capture complex temporal patterns in data. Yang et al. (2021) improved the performance of an attention neural network for nonintrusive load monitoring by applying the Time2Vector method. This current study applies Time2Vector embedding to input series and incorporates the resulting periodic and nonperiodic features into several deeplearning models to forecast crude oil, gold, and silver prices. Table 1 summarizes the literature on crude oil and precious metal forecasting.
Gradientboosting methods are powerful predictive models for many tasks. Borisov et al. (2021) compared the performance of treebased ensembles, such as XGBoost, LightGBM, and CatBoost (Prokhorenkova et al. 2018), with some deeplearning models, including but not limited to multilayer perceptron, regularization learning networks, neural oblivious decision ensembles, and transformers. They assert that machine learning treebased models outperform deeplearning models in several prediction tasks with tabular data; however, their study does not include deeplearning models for sequential data and is silent about forecasting financial market prices. To address this shortfall, in the current study, we will use treebased ensemble models such as random forest and LightGBM compared with 12 deeplearning models and two other machinelearning models (KNN and SVR) to forecast daily crude oil and precious metals market prices.
This study makes significant contributions to the literature on forecasting commodity market prices.

Considering that there is limited literature on using deeplearning models to forecast the price of commodity markets, this study implements and compares various types of stateoftheart deeplearning models for crude oil and precious metal spot price forecasting. Hence, our study encompasses several forecasting results that provide comprehensive insights for crude oil, gold, and silver market players and investors.

Most studies on precious metals focus only on gold price predictions; however, this study forecasts the price of both gold and silver to maintain a more general understanding of the precious metal markets.

To the best of our knowledge, this study is the first in forecasting literature that applies the TCN model, Time2Vector embedding module, and hybrid TCNBiLSTM and TCNBiGRU models to forecast the spot price of WTI, Brent, Gold, and Silver time series.

The forecasting period in the test dataset of this study, from 20200103 to 20220325, covers two critical global events that significantly affected financial markets. First, the financial crisis during the COVID19 pandemic significantly impacted all financial markets; in particular, crude oil prices plunged in April 2020. Second, the Russia–Ukraine conflict in February 2022 was associated with a sharp rise in crude oil, gold, and silver prices. Therefore, the results of this study and the proposed models can be used during financial crises and extreme global situations. Figure 5 shows the line chart of the WTI, Brent, gold, and silver prices for reference.
Methodology
LSTM and BiLSTM
LSTM and BiLSTM are structural variants of RNN models that can remember important information from timeseries sequences (Lin et al. 2022). In particular, BiLSTM concatenates two LSTM layers in opposite directions. The interior structure of a common LSTM cell is shown in Fig. 1a. An LSTM unit consists of an input gate, a forget gate, and an output gate. These gates facilitate information flow and help the cell forget unnecessary information. First, the forgetting gate decides what information from the inputs and previous hidden states to discard. Second, the input gate decides what information from the inputs and previous cell states to keep and updates the cell state. Finally, the output gate obtains the output \(h_{t}\) by multiplying the \(o_{t}\) of the input information processed by the sigmoid activation function and the cell state vector transformed by the tanh activation function. The equations of a forward pass in an LSTM unit are as follows:
where \(x_{t} \in {\mathbb{R}}^{d}\) is the input vector, and \(h_{t} \in {\mathbb{R}}^{h}\) is the hidden state vector. Furthermore, \(f_{t}\) is the forget gate vector, \(i_{t}\) is the input gate vector, \(o_{t}\) is the output gate vector, \(c_{t}{\prime}\) is the temporary cell state vector, \(c_{t} \in {\mathbb{R}}^{h}\) cell state vector, and \(W \in {\mathbb{R}}^{h \times d} ,{ }U \in {\mathbb{R}}^{h \times h} ,{\text{ and }}b \in {\mathbb{R}}^{h}\) represent the parameter matrices and vectors.
In a BiLSTM model, from opposite directions, \(h_{t}\) is concatenated to construct the bidirectional hidden state. The formulas of bidirectional \(h_{t}\) are as follows:
GRU and BiGRU
Like the LSTM, the GRU is a variant of RNN cells that can forget insignificant information and help the model use longer data sequences. GRU has fewer parameters than LSTM because it eliminates the output gate.
where \(x_{t} \in {\mathbb{R}}^{d}\) is the input vector, and \(h_{t} \in {\mathbb{R}}^{h}\) hidden state vector. Additionally, \(z_{t}\) is the forget gate vector, \(r_{t}\) is the reset gate vector, \(\hat{h}_{t}\) is the candidate activation vector, \(W \in {\mathbb{R}}^{h \times d} ,{ }U \in {\mathbb{R}}^{h \times h} ,{\text{ and }}b \in {\mathbb{R}}^{h}\) represent the parameter matrices and vectors, and \(\sigma\) is the sigmoid activation function. For certain sequential datasets, GRUs outperform LSTM models (Chung et al. 2014; Gruber and Jockisch 2020). The internal structure of the GRU cell is depicted in Fig. 1b.
For a bidirectional GRU model, hidden state vectors from two opposite directions are concatenated as follows:
Figure 1c shows the architecture of a singlelayer bidirectional LSTM (BiLSTM) or bidirectional GRU (BiGRU) model.
CNN
A CNN is a FNN model proposed by Lecun et al. (1998). CNNs are very popular in computer vision applications, such as facial recognition systems, object localization, object detection, and semantic segmentation. CNNS are effective at capturing local patterns and features within a time series. The convolutional layers learn filters to detect specific temporal patterns, making CNNs well suited for capturing local dependencies and shortterm patterns in timeseries data. CNNs are inherently translation invariant, meaning they can detect patterns regardless of their position in the input sequence. This property is helpful for timeseries analysis because the same patterns may occur at different time steps. The local perception and weight sharing of CNN can significantly reduce the number of parameters, thus improving the efficiency of model learning (Lu et al. 2020); however, they suffer from limitations such as the requirement for fixedlength inputs, lack of consideration of temporal ordering, and limited ability to detect longterm temporal dependencies.
The architecture of this model is generally constructed from two layers: the convolution layer and the pooling layer. The convolution layer extracts useful features from the input series by applying several convolution kernels to the inputs, as indicated in Eq. 17, which downsamples the input for final forecasting. Then, a pooling layer is applied to the output of the convolution layer to reduce the dimensionality of the model.
where \(l_{t}\) is the output of the convolution layer, \(\sigma\) is the activation function, \(x_{t} \in {\mathbb{R}}^{d}\) is the input vector, \(k_{t} \in {\mathbb{R}}^{d}\) is the parameter vector of the convolution kernel, and \(b_{t}\) is the bias term.
TCN
The intrinsic weaknesses of CNN, including fixedsize inputs and mismatched input and output dimensions, restrict its application in timeseries forecasting. The TCN (Lea et al. 2016) is a variant of the CNN that employs casual and dilated convolutions appropriate for sequential data with temporality and large receptive fields. Causal means no information leakage from the future to the past, and the receptive field means the set of sample elements of the original input that affect a specific element of the output. A TCN model can show full coverage of the input history by setting a proper dilated factor and kernel size. Furthermore, the TCN has a simple network structure and outperforms standard recurrent networks, such as the RNN and LSTM networks, regarding the effectiveness and efficiency of timeseries predictions (Yan et al. 2020). Figure 2 shows a general representation of our TCN model with dilated causal convolutions. This model’s architecture consists of the following.
Dilated convolution layer: The dilated convolution architecture modifies Kroneckerfactored convolutional filters, enabling a larger receptive field with fewer parameters and layers (Zhou et al. 2015). For a sequence of \(x_{t} \in {\mathbb{R}}^{d}\) and a filter \(f:\left\{ {0, \ldots , k  1} \right\} \to {\mathbb{R}}\), the dilated convolution operation \(*_{D}\) on entries \(s\) of the sequence is defined as follows:
where \(D\) is the dilation factor, \(k\) is the filter size, and \(s  D.i\) assures that only past data are convoluted. A tanh function transforms the output of the dilated causal convolution layer.
Dropout layer: A dropout layer with a probability of 0.2 is applied after each dilated convolution layer to regularize the model and eliminate the overfitting problem.
Residual block: We used a stack of two dilated causal convolution layers together, and the results from the final convolution were added back to the inputs to obtain the outputs of the block. The residual connection avoids the vanishing and/or exploding gradient problem in deeplearning models.
Fully connected layer: The output of the residual block is then inputted into a fully connected layer to predict the nextday price.
In Fig. 2, the TCN model has a stack of two layers, a residual connection, and a fully connected layer. Each layer in the stack has a dilated causal convolution, a tanh activation function, and a dropout for regularization. The dilation factors for the dilated convolution layer are \(D = 1, 2, 4\) and a filter size of \(k = 2\). When \(D = 1\), the dilated convolution becomes a basic convolution.
In recurrenttype neural networks, operations apply sequentially. In contrast, in a TCN model, all sequences are convolved simultaneously in each dilated convolutional layer; hence, the training of TCN is much faster than in STM or GRU models (Lea et al. 2016).
Time2Vector (T2VBiLSTM and T2VBiGRU)
Timeseries input can be considered a sequence in which a dependency across time exists among the sample data rather than being identically and independently distributed (i.i.d); therefore, it is essential to account for time features while developing a timeseries forecasting model. Vector embedding has been successfully used in many NLP tasks (Pennington et al. 2014; Mikolov et al. 2013; Almeida and Xexéo 2019). Similarly, Time2Vector (Kazemi et al. 2019) is a learnable vector embedding for time that can be easily combined with many deeplearning models. Time2Vector is a decomposition technique that encodes a temporal signal into periodic and nonperiodic patterns, allowing the model to understand and learn from the timedependent patterns. It eliminates the need for explicit feature engineering when dealing with timerelated features. By incorporating temporal information meaningfully, Time2Vector can improve the performance of timeseries models.
For a given scalar notion of time \(\tau\), Time2Vec of \(\tau\) is a vector of size k + 1 defined as follows:
where \(T2V\left( \tau \right)\left[ i \right]\) is the i^{th} element of \(T2V\left( \tau \right)\). \({\mathcal{F}}\) is a periodic activation function, and \(w\) and \(b\) are learnable weight and bias parameters, respectively. Following the indicated activation function in the original T2V paper (Kazemi et al. 2019), we use a sine function as \({\mathcal{F}}\). Time2Vector (T2V) assures that the time scale will not affect the learned periodic and nonperiodic time features (Yang et al. 2021).
To construct the T2VBiLSTM and T2VBiGRU models, first, the input sequences are transformed by Time2Vector embeddings, then the embedded input vectors are entered into a singlelayer BiLSTM or BiGRU model, and finally, the output is predicted through a fully connected layer. Figure 3 presents a schematic of the T2VBiLSTM or T2VBiGRU model. Figure 4 summarizes the complete data preprocessing, model training, and prediction process for this study’s test set.
Hybrid models
To verify the applicability of hybrid models in forecasting daily crude oil, gold, and silver prices, we used CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU models. CNNs in the initial layers of the hybrid model can learn lowlevel spatial features, such as local patterns, while the BiLSTM layers can learn highlevel temporal dependencies. This hierarchical representation learning allows the model to capture local and global dependencies in the timeseries data. CNNs and TCNs are well suited for feature extraction from raw data, including timeseries data. They can automatically learn relevant features and reduce the dimensionality of the input, which can be beneficial for downstream BiLSTM or BiGRU layers to learn more meaningful representations. The explanation of each model structure is as follows.
CNNBiLSTM and CNNBiGRU models: First, a onedimensional convolution layer is applied to input sequences in the CNN module. Then, a max pooling layer is applied to the output of the convolution layer to extract the essential features. Next, the output of the pooling layer is entered into a singlelayer BiLSTM or BiGRU module, and the final output is predicted through a fully connected layer.
TCNBiLSTM and TCNBiGRU models: First, a TCN module receives the input sequences. Next, the output of the TCN is introduced into a singlelayer BiLSTM or BiGRU module, and the final output is predicted through a fully connected layer.
Ensemble and machinelearning models
This study uses random forest and LightGBM, a gradientboosting technique among the ensemble machinelearning models. Random forest generally provides high prediction accuracy because of the aggregation of multiple decision trees. It is less prone to overfitting than individual decision trees. By combining multiple trees and using techniques such as bagging and random feature selection, random forest reduces variance and improves the model’s generalization ability. It is also robust to outliers and missing values; however, it lacks autocorrelation modeling because random forest treats each data point independently and does not explicitly consider the temporal dependencies between consecutive observations in the time series. Random forest is not well suited for extrapolation, especially for longterm forecasts; thus, it may be difficult to capture and project future trends extending beyond the observed data range. While random forest is generally robust to overfitting, it can still be sensitive to noisy data; it may overfit the noise if the dataset contains a substantial amount of noise or irrelevant features, leading to degraded performance.
LightGBM is a powerful and efficient gradientboosting framework that performs excellently in various machinelearning tasks. LightGBM is highly efficient and can handle large datasets with millions of instances and features. It uses a histogrambased algorithm to achieve faster training and prediction times than traditional gradientboosting implementations. The main advantage of LightGBM is low memory usage due to the use of a compact data structure for representing the dataset during training. Like other gradientboosting algorithms, LightGBM can be prone to overfitting if not properly regularized or tuned. LightGBM may struggle to capture complex feature interactions compared with deeplearning models.
SVR is a machinelearning model that captures linear and nonlinear relationships between variables. It can handle highdimensional datasets and capture complex relationships between variables. The algorithm focuses on the support vectors, the data points that influence the model’s decision boundary most. Outliers have less impact on this model because of the use of a margin. SVR allows using different kernel functions, such as linear, polynomial, radial basis function, and sigmoid. This flexibility enables the modeling of various relationships between the input and target variables; however, SVR performance highly depends on selecting appropriate hyperparameters, such as kernel type, regularization parameter, and kernelspecific parameters. Training an SVR model can be computationally expensive, especially when dealing with large datasets or complex kernel functions. SVR does not account for the temporal dependencies among observations for timeseries datasets.
KNN is an instancebased, nonparametric algorithm that uses different distance metrics, such as Euclidean distance, Manhattan distance, or cosine similarity, to make predictions. The KNN does not explicitly learn a model from the training data. Instead, it stores the entire training dataset and uses it during prediction, eliminating the need for a timeconsuming training phase. As the number of training instances increases, the algorithm’s prediction time can be significant because it requires calculating distances to all training samples. Some limitations of KNN models are the curse of dimensionality, sensitivity to the scale of features, intensive memory requirement, timeconsuming predictions with large datasets, and lack of capturing temporal dependencies.
Evaluation criteria
This study adopts the following three metrics to calculate the forecasting error and evaluate the prediction performance: MAE, MAPE, and RMSE. MAE measures the difference between two continuous variables and calculates the mean value of all absolute errors. MAPE is a scaleless error value that measures the relative forecasting error. RMSE represents the standard deviation of the residual error between the predicted and observed values. The models’ prediction performance increases with decreasing error measures. The formula for the above evaluation criteria is as follows:
where \(n\) is the sample size, and \(y_{i}\) and \(\hat{y}_{i}\) are the true and predicted values for sample \(i\), respectively.
Empirical analysis and results
Data description and preprocessing
The daily closing prices of WTI and Brent crude oil, gold, and silver were collected from 20000104 to 20220325 (Fig. 5). The original spot price data for WTI and Brent crude oil are derived from the US Energy Information Administration (https://www.eia.gov), while the spot prices of gold and silver are from KITCO (https://www.kitco.com). We used data from the same trading days across all four markets to obtain an identical sample size for all time series.
To find the best hyperparameters and evaluate the models’ realworld performances, evaluating them on a separate validation set and a test set representing future unseen data is essential. Splitting the timeseries datasets is challenging because of temporal dependencies, seasonality, and trends. If we split the data randomly, it breaks the temporal order, and the model may be trained on future data, leading to data leakage and overfitting. Moreover, if the training set does not capture the full range of seasonality or fails to include representative trend patterns, the model’s ability to generalize to unseen data may be compromised. Ensuring the training set contains consecutive past observations to predict future observations, includes multiple seasonal cycles, and adequately captures the underlying trends is crucial. Timebased splitting and rolling window approaches can address these challenges in timeseries analysis. In timebased splitting, we split the data based on a specific date or time, ensuring that the training set only contains past observations and the test set contains future observations. In the rolling window approach, a sliding window is used to create samples in the training, validation, and test sets, where each sample includes past observations and the corresponding future target observation. Thus, for each market, the entire dataset is split into three parts: 65% training data (from 20000104 to 20140615), 25% validation data (from 20140616 to 20200102), and 10% test data (from 20200103 to 20220325). The test data period includes the financial crisis due to the COVID19 pandemic and the sharp decline in crude oil prices in April 2020. Therefore, test data include highly volatile price data, making forecasting even more challenging.
Since deeplearning models are sensitive to the scale of data, we normalized each dataset into [0,1] intervals to limit the effect of noise, speed up the updating of neural network parameters, and enhance the training performance of the model. The formula to standardize the data is as follows:
where \(x_{t}\) and \(x_{t}{\prime}\) denote the data before and after standardization, respectively. Table 2 summarizes the sample’s descriptive statistics and statistical tests for WTI and Brent crude oil, gold, and silver. The total sample size for all markets is 5426. All four market spot prices show significant characteristics of skewness, while WTI, Brent, and gold also represent significant leptokurtic properties at a 5% significance level. Furthermore, the significant Jarque–Bera test statistics at a 1% significance level show that the WTI, Brent, gold, and silver price time series do not comply with the normal distribution; hence, these markets can be treated as nonstationary signals.
For these forecasting tasks, \(x_{t} = \left\{ {x_{1} , x_{2} , \ldots , x_{s} } \right\}\) is the input vector, where \(x_{i}\) is the price data at day \(i\) and \(s\) is the sequence length (sliding window length), and \(y_{t} = \left\{ {x_{s + 1} } \right\}\) is the target. We created inputs for different sequences before sending a series into the model. In this study, we train 16 deep and machinelearning models with four different sliding window lengths of 5, 30, 60, and 90 days to predict the nextday WTI, Brent, gold, and silver prices. We have considered 5 as a relatively short sliding window length and 30, 60, and 90 as relatively long to capture any seasonality or trend in the data. We will compare deep and machinelearning models to determine how they forecast commodity price time series with longer input sequences.
Empirical results
Crude oil and precious metals are essential commodities in financial markets. This study aims to forecast the daily price of WTI and Brent crude oil, gold, and silver through deeplearning models and compare the prediction performance of deeplearning models with random forest, LightGBM, SVR, and KNN models as baseline machinelearning models, hence, our results indicate the best deeplearning model for forecasting crude oil, gold, and silver daily prices. We will experiment with the performance of all models across four sliding window lengths of 5, 30, 60, and 90 days to indicate the suitable input length for superior performance with each model. The deeplearning models used in this study are LSTM, BiLSTM, GRU, BiGRU, T2VBiLSTM, T2VBiGRU, CNN, CNNBiLSTM, CNNBiGRU, TCN, TCNBiLSTM, and TCNBiGRU models.
We used grid search on the validation dataset to tune and select the optimal hyperparameters of each model. The common hyperparameters among all models are the number of epochs, batch size, dropout rate, and learning rate, equal to 50, 32, 0.2, and 0.001, respectively. Table 3 presents the selected hyperparameters of four bestperforming models in this study. Due to the large scale of the study and space limitations, we only presented the selected hyperparameters of BiGRU, T2BiGRU, TCN, and TCNBiGRU models for each market. The hyperparameters of the other models are available upon request from the corresponding author.
After each training step, the weights of the models are updated by the Adam optimizer with a scheduled learning rate (lr) as follows:
The initial learning rate (\(lr_{0}\)) is 0.001, applied from epoch one through epoch five, and then exponentially decreases for each epoch after epoch five. In this study, the models were trained to minimize the MSE loss function. The objective function of the training process is as follows:
where \(\hat{y}_{i}\) is the predicted price, and \(y_{i}\) is the true target price for sample \(i\).
Overfitting in financial market price forecasting experiments can lead to misleading and unreliable results. Overfitting occurs when a model is too complex and can capture the noise in the data rather than the underlying patterns. The consequences of overfitting in financial market price forecasting can be severe. Traders reliant on the overfilled model may make poor investment decisions, leading to significant losses. Furthermore, the overfilled model may be susceptible to market changes, making it difficult to use in realworld situations. Techniques such as crossvalidation, dropout, early stopping, and pruning (for random forest and LightGBM) are employed to mitigate the risk of overfitting in crude oil and precious metals market price forecasting. Crossvalidation involves partitioning the data into training and validation sets and evaluating the model on the validation set to assess its generalization performance. Model regularization in this study is achieved through a dropout layer in the models’ architectures and early stopping after 10 epochs during training. Early stopping will end the training process if the validation error does not improve. To further assure the robustness of the forecasting results, all reported errors and predicted values are the average outputs from 10 runs of each model.
All deeplearning models are implemented using Tensorflow Keras, and machinelearning models are created using Sklearn. The experiments were conducted using Python 3.8 and run on a computing system with a 70 W Tesla T4 NVIDIASMI GPU, CUDA version 11.2, and 16 GB RAM.
WTI price forecasting
To show the computational performance of our deeplearning models for WTI nextday spot price forecasting, we draw the forecasting performance of LSTM, BiLSTM, GRU, BiGRU, T2VBiLSTM, T2VBiGRU, CNN, CNNBiLSTM, CNNBiGRU, TCN, TCNBiLSTM, and TCNBiGRU models, which we compare with the baseline models, i.e., random forest, LightGBM, KNN, and SVR models. Each model was executed 10 times to reduce randomness and improve the robustness of the results. Table 4 presents the MAE, MAPE, and RMSE values for the forecasted nextday WTI prices in the test dataset across all models. Among the evaluated models and considering two out of three performance criteria, the TCN model consistently achieves the lowest MAE and MAPE for WTI price forecasting across all input sliding window sizes. However, when considering the RMSE metric, the BiGRU model outperforms the other models for input sequences of lengths 5 and 30. Conversely, for input sequences of lengths 60 and 90, the TCNBiGRU and T2VBiGRU models demonstrate superior performance, respectively. In addition to the superior prediction performance, the forecasting error of the TCN model is not significantly affected by the input sequence length, as we obtain MAE values of 1.510, 1.455, 1.444, and 1.472 with sequence lengths of 5, 30, 60, and 90, respectively. Comparing this with other models, we can see that most models’ performance is more sensitive to the input sequence length. Using bidirectional models has proved effective in NLP tasks (Arbane et al. 2023; Huang et al. 2023; G. Liu and Guo 2019; Raza and Schwartz 2023); however, little attention has been paid to using these models for price timeseries forecasting. In this study, all three performance criteria from Table 4 show that bidirectional recurrent models, such as BiLSTM and BiGRU, perform better than unidirectional models, such as LSTM and GRU, for all sequence lengths. Bidirectional RNNs exploit the network memory to process information from backward and forward directions. Therefore, interdependency among data samples is learned better compared to unidirectional models that only use forwarddirection information processing. Our findings comply with Yang and Wang (2022) and SiamiNamini et al. (2019), who found that the BiLSTM model outperformed the LSTM model for timeseries prediction. Furthermore, it is evident from Table 4 that GRUtype models such as GRU, BiGRU, T2VBiGRU, CNNBiGRU, and TCNBiGRU perform better than LSTMtype models such as LSTM, Bi LSTM, T2VBi LSTM, CNNBi LSTM, and TCNBi LSTM in WTI price forecasting.
To evaluate the effectiveness of Time2Vector embedding in WTI price forecasting, we compare the MAE, MAPE, and RMSE of the BiLSTM and BiGRU models with those of the T2VBiLSTM and T2VBiGRU models, respectively. Using the T2V input embedding, the MAE of the BiLSTM and BiGRU models with input sequence 5 increases from 1.821 and 1.570 to 1.985 and 1.889, respectively. In contrast, the MAE of the BiLSTM and BiGRU models with input sequence 90 decreases from 1.904 and 1.699 to 1.670 and 1.523, respectively. Arguably, Time2Vector embedding does not improve forecasting with smaller input sequences, 5 and 30, while it improves the WTI price forecasting performance for longer sequences of 60 and 90. To study the impact of hybrid models, such as CNNBiLSTM and CNNBiGRU, we compared their performance with single BiLSTM and BiGRU models. Combining the CNN model with recurrenttype models has a detrimental effect on the forecasting performance of WTI prices, as evidenced by an increase in MAE across all sequence lengths. This outcome occurs because the CNN module downsamples the input sequence, and some information that might be useful for BiLSTM or BiGRU models will be lost, resulting in higher forecasting errors. Similarly, a single TCN model outperforms the hybrid TCNBiLSTM and TCNBiGRU models. The TCN model can see the entire sequence in its receptive field and use the best temporal features to forecast the WTI price; therefore, combining it with a recurrenttype model will only increase the complexity of the model and cause an overfitting problem without significant improvements in forecasting performance.
Upon examining the forecasting errors of ensemble treebased models, i.e., random forest and LightGBM, it becomes clear that random forest performs poorly in predicting WTI prices, whereas LightGBM demonstrates exceptional forecasting capabilities. The MAPE and RMSE values of LightGBM across sequence lengths of 5, 30, and 90 days are consistently the lowest among all 16 forecasting models. Consequently, LightGBM can be considered an approximate match to the TCN model as the topperforming method for WTI price forecasting. Moreover, the performance of LightGBM exhibits a slight decline as the input sequence lengths increase; however, this decrease in performance is not significant, indicating that LightGBM is relatively insensitive to variations in the input sequence length. Conversely, using the SVR and KNN models, it becomes clear that the performance of conventional machinelearning models tends to deteriorate as the input sequences grow. In contrast, deeplearning models are less affected by larger input sequences, demonstrating their robustness. All deeplearning models outperform the SVR and KNN models for larger input sequences; however, for smaller sequences, such as those with a length of 5, the KNN model performs better than the deeplearning models, except for the BiGRU and TCN models. This discrepancy can be attributed to the data within each sequence serving as input features for the KNN model. As the sequence length increases, the KNN model faces greater challenges in identifying the nearest neighbors required for accurately predicting the target price.
Figure 6 presents the RMSE for the WTI nextday spot price forecasting models to find the best sliding window length for each forecasting model. Our experiments with WTI price forecasting show that using only recurrenttype models such as LSTM, GRU, BiLSTM, BiGRU, T2VBiLSTM, and T2VBiGRU, we obtain better prediction performance compared with using only CNN or a hybrid of CNN with Recurrenttype models such as CNNBiLSTM and CNNBiGRU. Recurrenttype models are not very sensitive to the input sequence length, and they even perform slightly better with relatively longer input sequences because longer sequences enable the model to learn more upward, downward, and complex patterns and generalize better in predicting unseen data. Nonetheless, since the CNN models cannot memorize important information from past data points, the forecasting error of CNNtype models, such as a single CNN, CNNBiLSTM, and CNNBiGRU, increases with the input sequence length. The RMSE of TCNBiLSTM and TCNBiGRU is generally smaller than the RMSE of CNNBiLSTM and CNNBiGRU models; therefore, we can conclude that among the hybrid models, the TCN module performs better than the CNN module in extracting the essential temporal features. Figure 6 shows that the input sequence of 60 days of lagged data points is generally better than other sliding window lengths such as 5, 30, or 90 days for WTI daily price forecasting; however, the CNN, CNNBiLSTM, and CNNBiGRU models perform better with an input sequence of 5 days than the other sequence lengths for WTI price prediction. Among the machinelearning models, Ensemble treebased models emerge as the leading models for forecasting WTI prices. Notably, the random forest model exhibits subpar performance with shorter input sequences. LightGBM consistently performs well across all input sequences, demonstrating its robust forecasting capabilities. In contrast, the forecasting performance of the SVR and KNN models deteriorates as the input sequence length increases, suggesting that these models struggle to capture complex patterns and relationships effectively within longer data sequences.
Our observations regarding WTI forecasting align with Qin et al. (2023), where the GRU model demonstrated superior performance compared with random forest, SVR, and LSTM models, achieving a lower MAPE value. Similarly, our results corroborate with J. Yuan et al. (2023), highlighting that LightGBM exhibited significantly better performance than the LSTM and SVR models.
Figure 7 compares the line chart of predicted WTI prices in the test dataset with the actual WTI price value from 20200103 to 20220325. The predicted values at the end of April 2020 indicate that the TCN model surpasses the LightGBM model in accurately capturing sharp changes in the WTI price. The TCN model demonstrates superior performance in detecting and predicting abrupt fluctuations in price, showcasing its ability to capture and respond to sudden market dynamics with greater precision than the LightGBM model.
Brent price forecasting
Table 5 shows the errors, MAE, MAPE, and RMSE, of our forecasting models for Brent nextday spot price forecasting. We compared the forecasting performance of the LSTM, BiLSTM, GRU, BiGRU, T2VBiLSTM, T2VBiGRU, CNN, CNNBiLSTM, CNNBiGRU, TCN, TCNBiLSTM, and TCNBiGRU models with the baseline models, random forest, LightGBM, KNN, and SVR models. According to the lowest values of the MAE and RMSE measures for all input sequence lengths, 5, 30, 60, and 90, the TCN is the bestperforming model in predicting the Brent crude oil price in the test dataset. Considering the MAPE for input sequences with 5 lagged data points, the TCN model has the best Brent price prediction performance; for input sequences of lengths 30, 60, and 90, the T2VBiGRU model outperforms other models. Furthermore, the TCN model is not particularly sensitive to the input sequence length. The TCN achieves a robust and stable forecasting performance for all input sequence lengths as the MAE with 5, 30, 60, and 90 sequences are 1.295, 1.353, 1.315, and 1.301, respectively. The performance of most other models exhibits higher sensitivity to changes in the input sequence length for Brent crude oil. For instance, the MAEs of the CNN model grow with increasing sequence length as it obtains MAEs of 1.542, 1.879, 2.818, and 5.194 with sequence lengths of 5, 30, 60, and 90, respectively. Similar to our findings for WTI crude oil price forecasting, we found that BiLSTM and BiGRU models generally outperform unidirectional LSTM and GRU models in forecasting Brent crude oil prices. By juxtaposing the MAE, MAPE, and RMSE of the GRUtype models (such as GRU, BiGRU, T2VBiGRU, CNNBiGRU, and TCNBiGRU) with those of the LSTMtype models (such as LSTM, BiLSTM, T2VBiLSTM, CNNBiLSTM, and TCNBiLSTM) we found that a GRU unit is a more appropriate recurrent unit for Brent crude oil price forecasting.
The impact of Time2Vector embedding in Brent crude oil price forecasting is assessed by comparing the MAE, MAPE, and RMSE of the T2VBiLSTM and T2VBiGRU models with the BiLSTM and BiGRU models, respectively. Table 5 shows that T2V embedding improves the forecasting performance of the BiLSTM model for input sequences of 60 and 90 while it stimulates the performance of the BiGRU model for input sequences of 30, 60, and 90. The results of Brent crude oil price forecasting confirm that T2V embedding favorably influences forecasting with longer input sequences. For the hybrid models, our results indicate that combining the CNN model with recurrenttype models adversely affects the performance of the BiLSTM and BiGRU models for Brent crude oil price forecasting. The same pattern appears when comparing the forecasting performance of a single TCN model with the TCNBiLSTM and TCNBiGRU hybrid models in predicting Brent daily prices. The TCN model outperforms the hybrid models.
Comparing the forecasting errors of the random forest, LightGBN, SVR, and KNN models with our deeplearning models indicates that the forecasting performance of deeplearning models is superior to that of machinelearning models. However, the ensemble LightGBM model stands as an exception, demonstrating remarkable performance as the secondbest model among all 16 models for forecasting Brent crude oil prices across all input sequence lengths. This exceptional performance sets LightGBM apart from the other models, emphasizing its robustness and effectiveness in accurately predicting Brent crude oil prices, regardless of the input sequence length; however, for the short sequence length of 5, the KNN performs better than the deeplearning models, except for the BiGRU, CNN, and TCN models.
Figure 8 represents the RMSE of the forecasting models implemented in this study to predict the nextday Brent crude oil price in the test dataset. Our results denote that the recurrenttype models such as LSTM, GRU, BiLSTM, BiGRU, T2VBiLSTM, and T2VBiGRU outperform the CNN and hybrid models such as CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU in terms of Brent price forecasting. Figure 8 shows that, in general, the efficacity of recurrenttype models in predicting the Brent price is enhanced with relatively longer input sequences; however, the CNN and hybrid models do not perform well with longer input sequences. The RMSE of TCNBiLSTM and TCNBiGRU are mainly lower than the RMSE of CNNBiLSTM and CNNBiGRU models; therefore, we can infer that the TCN module performs better than the CNN module in extracting the critical temporal features of Brent crude oil price. Examining the ensemble and conventional machinelearning models, namely random forest, LightGBM, SVR, and KNN, indicates that the optimal forecasting input sequence for Brent price prediction is five days. The LightGBM model achieves superior forecasting across all input sequences and, thus, is not significantly affected by changes in the input sequence length. As a general observation, the forecasting performance of these baseline models declines as the input sequence length increases, which indicates that shorter input sequences provide more accurate and reliable predictions than longer sequences when using these models for forecasting Brent prices. Regardless of the machine learning–type models, CNN, CNNBiLSTM, and CNNBiGRU models that perform better with shorter input sequences, our experiments indicates that the best input sequence length for Brent crude oil forecasting is 60 days of past data. Hence, the lowest RMSE values across most of the deeplearning models in this study are achieved for an input sequence length of 60 for Brent crude oil price forecasting.
Our results validate the conclusions drawn by Zhao et al. (2017), indicating that deeplearning models outperform machinelearning models, such as SVR, in forecasting crude oil prices. Figure 9 compares the line chart of predicted Brent crude oil prices in the test dataset with the actual Brent price values from 20200103 to 20220325. Analyzing the predicted value during the abrupt Brent price change periods shows that the TCN model outperforms the LightGBM model in accurately capturing sharp changes in Brent price. Thus, TCN is a more reliable model for predicting the sudden changes in Brent price.
Gold price forecasting
Table 6 presents the forecasting errors of gold price prediction with 16 deep and machinelearning models. Considering the models’ resulting MAE, MAPE, and RMSE, the TCN model has the best gold price prediction performance for input sequences of 5 and 90 days. Moreover, for gold price predictions with input sequences of 30 and 60, the BiGRU and GRU models show superior performance. Our results show that in most cases, the deeplearning models performed remarkably better than the baseline random forest, LightGBM, SVR, and KNN models in predicting the price of gold. Compared with CNNBiLSTM, TCNBiLSTM, and TCNBiGRU, the SVR model achieved lower MAE, MAPE, and RMSE values. The prediction with gold price data shows that bidirectional LSTM models perform better than unidirectional LSTM models for all input sequences. Meanwhile, the BiGRU model outperformed the GRU model exclusively for input sequences of 5 and 60 days. Comparing the gold price forecasting errors of the GRUtype models, such as GRU, BiGRU, T2VBiGRU, CNNBiGRU, and TCNBiGRU, with those of the LSTMtype models, such as LSTM, Bi LSTM, T2VBi LSTM, CNNBi LSTM, and TCNBi LSTM, we found that the GRUtype models are more appropriate than the LSTMtype models for gold price forecasting.
Figure 5 shows that the dynamics of gold price movement from 20000104 to 20220325 differs from the WTI and Brent crude oil markets, and an upward trend is visible in Gold price movements throughout the time. Nevertheless, our deeplearning models could predict the gold price for the test data relatively well. In contrast to its performance in WTI and Brent price forecasting, the LightGBM model surprisingly did not exhibit strong generalization capabilities when predicting the gold price during the test data period. Despite its success in other forecasting tasks, the LightGBM model failed to provide accurate and reliable predictions for gold prices, indicating that the underlying dynamics and patterns of gold price data might differ significantly from those of WTI and Brent. Table 8 shows the coefficient of variation for the resulting MAEs of all forecasting models. The coefficient of variation is a scaleless value calculated by dividing the SD of the model MAEs through various input sequence lengths by the mean of those MAEs. The forecasting results of the gold market with the results of the WTI and Brent crude oil markets from Table 8 show that the models are more sensitive to the input sequence lengths of the gold market as the MAE forecasting error of each model varies markedly across the sequence lengths.
Figure 10 depicts the RMSE of our forecasting models to predict the nextday gold price in the test dataset. The recurrenttype models, such as LSTM, GRU, BiLSTM, BiGRU, T2VBiLSTM, and T2VBiGRU, generally have lower RMSE values compared to the CNN and hybrid models, such as CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU. This result aligns with the research conducted by He et al. (2019) on gold price prediction, which demonstrated that a hybrid CNN–LSTM model did not exhibit superior performance compared with individual CNN or LSTM models.
A shorter input sequence of 5day price data is more useful in gold price predictions with deep and machinelearning models. The gold price forecasting performance generally deteriorates by increasing the input sequence length. The best prediction performance across all models and sequences was achieved through the BiGRU model using 30 days of gold price data. Based on the findings presented in Table 8, it is evident that LightGBM exhibits a higher coefficient of variation for MAE in Gold price forecasting than WTI and Brent crude oil. This outcome indicates that LightGBM is considerably sensitive to changes in the input sequence length when predicting the gold price. The higher coefficient of variation indicates that the performance of LightGBM may vary significantly when the input sequence length changes, underscoring the need for careful consideration and optimization of the input sequence length specifically for gold price forecasting with LightGBM. Figure 11 compares the line chart of predicted gold prices in the test dataset with the actual gold price values from 20200103 to 20220325. These results indicate that the random forest and LightGBM models do not generalize well in gold price forecasting. Comparing the performance of LightGBM and KNN models in predicting gold prices, our results demonstrate the superiority of LightGBM, which supports the study by Yuan (2023).
Silver price forecasting
As a precious metal, the daily spot price of silver is forecasted through the deeplearning models in this study and compared with the random forest, LightGBM, SVR, and KNN forecasts. Table 7 shows the MAE, MAPE, and RMSE of silver price predictions. The TCN model is the bestperforming model across all input sequence lengths to forecast the daily silver price, as it scores the lowest MAE, MAPE, and RMSE among all models. Besides the TCN’s superior ability to forecast the silver price, this model is the least susceptible to the input sequence length, as shown by the MAE coefficient of variation in Table 8. The coefficient of MAE variation across all sequence lengths is 0.015 for the TCN model, the lowest among all models. The results of this study indicate that, except for the TCNBiLSTM and TCNBiGRU models with an input sequence of five days, our deeplearning models are superior to the SVR and KNN models in predicting the price of silver. For silver price forecasting, providing bidirectional information seems promising with the BiLSTM model as it reached lower MAE, MAPE, and RMSE values than the unidirectional LSTM; however, bidirectional information did not improve the forecasting performance of the GRU model for silver price prediction. Furthermore, the results from Table 7 indicate that GRUtype models have a relatively better forecasting performance than LSTMtype models for silver price prediction.
Using the ensemble (random forest and LightGBM) or conventional (SVR and KNN) machinelearning models, only LightGBM outperformed some of the deeplearning models, namely CNN, CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU, in silver price forecasting. LightGBM was the best machinelearning model for silver price forecasting across all sequence lengths.
Comparing the MAE coefficient of variations between the silver and gold markets in Table 8 shows that the performance of our forecasting models is relatively less affected by changes in the input sequence length when predicting the silver market. This finding indicates that the forecasting models exhibit greater stability and consistency in their predictions for the silver market, regardless of variations in the input sequence length. Unlike the gold market, where the models show higher sensitivity to changes in the input sequence length, the silver market demonstrates a more robust and reliable forecasting performance across different input sequence lengths.
Figure 12 presents the RMSE of our deeplearning models to forecast the silver nextday price in the test dataset. Similar to the results of the WTI, Brent, and gold markets, the silver price forecasting error of the recurrenttype models such as LSTM, GRU, BiLSTM, BiGRU, T2VBiLSTM, and T2VBiGRU are generally lower than the forecasting error of the CNN and hybrid models such as CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU. The bestperforming model for predicting the silver price is the TCN model, which demonstrates robust forecasting performance across all input sequence lengths. Our results show that the recurrenttype models generally perform better with a longer input sequence of 90 days to predict the nextday silver price. The best prediction performance across all models and sequences is achieved through the TCN model using 60 days of past silver price data. Moreover, in hybrid models such as CNNBiLSTM, CNNBiGRU, TCNBiLSTM, and TCNBiGRU, the TCN module performs better than the CNN module in extracting the temporal features of the silver market price.
Figure 13 illustrates the line chart of the bestpredicted silver prices in the test dataset with the actual silver price values from 20200103 to 20220325, showing that the TCN and random forest models are the best and least generalizing models in silver price forecasting.
Using MAPE as the metric, our silver price prediction results surpass those of a Gono et al. (2023), which employed random forest and XGBoost methods. Our best MAPE for silver price prediction, 1.52%, significantly outperforms the best MAPE of 5.98% achieved by Gono et al. (2023).
Our significant empirical findings can be summarized as follows.

1.
TCN is the best model for generalizing and forecasting commodity market prices.

2.
LightGBM is the best machinelearning model for forecasting commodity market prices; however, compared with the TCN model, it performs poorly in capturing and responding to sharp market dynamics.

3.
GRUtype models are the best recurrenttype deeplearning models in commodity price forecasting.

4.
CNNtype models perform poorly in forecasting commodity market prices.

5.
The TCN and LightGBM models are the most robust to input sequence lengths in predicting commodity market prices.

6.
Using bidirectional models improves commodity price forecasting compared to only information from the forward price direction. This finding is also supported by SiamiNamini et al. (2019), indicating that BiLSTMbased modeling yields better predictions than regular LSTMbased models.

7.
To achieve superior forecasting performance, it is essential to consider the proper input sequence length for each deep or machinelearning model.

8.
Among WTI, Brent, gold, and silver, gold is the most sensible market for the input sequence length in price forecasting.

9.
Time2Vector embedding improves forecasting performance only when using longer input sequences.
Our findings provide valuable insights for analysts seeking to improve the accuracy of commodity market price forecasts. By examining the performance of various forecasting models and considering the impact of input sequence length on their predictive capabilities, our study offers guidance for selecting the most suitable models and input parameters for forecasting commodity market prices. With this knowledge, governments, energy sector managers, crude oil and precious metals investors can make sensible decisions. In a governmental context, crude oil and precious metal price forecasting helps governments in fiscal planning, economic policy decisions, resource allocation, revenue management, international trade negotiations, socioeconomic development, environmental policies, and geopolitical considerations. Accurate forecasts enable governments to make informed decisions that impact the national economy, public finances, and sustainable development. Accurate crude oil price forecasting gives managers valuable insights to optimize operations, manage risks, allocate budgets and resources efficiently, and make strategic decisions in the dynamic energy market. They can use forecasted prices to hedge against potential price fluctuations, secure favorable contracts, and manage exposure to market volatility. Accurate crude oil price forecasting can provide a competitive advantage by enabling managers to make timely and informed decisions. They can anticipate market trends, respond quickly to price fluctuations, and stay ahead of competitors regarding pricing, supply chain management, and customer satisfaction.
Conclusion
Crude oil, particularly WTI and Brent, is crucial in global financial markets and economics. In recent years, crude oil prices have become more vulnerable to geopolitical and macroeconomic factors. Thus, understanding the dynamics of crude oil markets is inevitable. Furthermore, precious metals such as gold and silver are key commodities mined in particular countries, which makes the economies of these countries highly reliant on precious metal markets. Moreover, gold is a substitute asset for stock markets and is indispensable in financial investment portfolios. Therefore, developing an accurate forecasting model for crude oil, gold, and silver price movements is vital for policymakers, business owners, investors, and other stakeholders to mobilize timely political movements, foresee market trends, and properly design investment strategies to mitigate investment risks. In this study, we implement 12 deeplearning models, namely, LSTM, BiLSTM, GRU, BiGRU, T2VBiLSTM, T2VBiGRU, CNN, CNNBiLSTM, CNNBiGRU, TCN, TCNBiLSTM, and TCNBiGRU, to forecast the WTI, Brent, gold, and silver market prices and compare their forecasting performance with four baseline models, namely, random forest, LightGBM, SVR, and KNN models. We use each market’s historical price information for this and apply four different sliding window lengths of 5, 30, 60, and 90 days. MAE, MAPE, and RMSE evaluation metrics are employed to assess the forecasting power of each model. We compared the forecasting performance of these models across various input sequence lengths and found that the TCN model is the bestperforming model for forecasting the prices of WTI, Brent, gold, and silver. LightGBM exhibits comparable forecasting performance to the TCN model in accurately predicting WTI and Brent crude oil prices. Our results also indicate that the BiGRU and GRU models are the best for predicting gold spot prices with input sequences of 30 and 60, respectively. The best forecasting performance for each market is WTI through a TCN model with input sequence 60, MAPE 3.53%, Brent through a TCN model with input sequence 5, MAPE 2.64%, gold through a BiGRU model with input sequence 30, MAPE 0.85%, and silver through a TCN model with input sequence 60, MAPE 1.53%. Eventually, our study indicates using the TCN model for superior financial timeseries price predictions. From the empirical results, we determine that the bidirectional LSTM and GRU models outperform the unidirectional LSTM and GRU models, respectively. Moreover, GRUtype models such as GRU, BiGRU, T2VBiGRU, CNNBiGRU, and TCNBiGRU outperformed their LSTMtype peers in predicting WTI, Brent, gold, and silver prices.
Our study has several implications for policymakers and investors. First, the results of this study can assist investors and decision makers in promptly anticipating crude oil, gold, and silver market prices and adjusting their investment portfolios. Additionally, stakeholders can execute riskhedging methods and lower their losses with timely predictions. In particular, gold is considered a suitable safehaven asset for the stock and cryptocurrency markets (Junttila et al. 2018). Therefore, timely prediction of the gold market price will help stock market investors hedge their portfolios. Regarding organizationallevel and countrylevel relationships, organizations such as the Organization of the Petroleum Exporting Countries, World Petroleum Council, and International Energy Agency and government agencies can further apply the indicated method, for example, the TCN model, to devise profitable policies related to global crude oil prices. Finally, our study would be particularly valuable for forecasting crude oil, gold, and silver prices in case of extreme events such as the COVID19 pandemic and the recent conflict between Russia and Ukraine, which were covered in the period considered in this study.
Several limitations must be acknowledged in our research on forecasting crude oil and precious metal prices. First, these markets’ volatile and nonlinear nature poses difficulties in capturing all the intricate patterns and sudden price changes. Additionally, external factors such as natural disasters, geopolitical events, and supply–demand dynamics can significantly influence commodity prices and accurately incorporating these factors into forecasting models remains a complex task. Finally, it is essential to acknowledge the inherent uncertainty in forecasting and implement appropriate risk management strategies. Addressing these limitations will enhance the robustness and reliability of our research findings.
Some possible directions for improving crude oil and precious metals price forecasting exist. First, rather than using only historical price data, other features such as technical indicators, macroeconomic features, supply and demand data, production rate, and interconnections with other financial markets can be used to predict crude oil and precious metal prices. Second, incorporating the stakeholders’ sentiments, which can be derived from news articles and social media platforms, might improve the forecasting performance of our proposed method. Finally, an alternative to using sequential data, other data structures, and learning methods, such as temporal graph neural networks, can be implemented to forecast price timeseries data.
Availability of data and materials
Not applicable.
Abbreviations
 ARDL:

Autoregressive distributed lag
 ARIMA:

Autoregressive integrated moving average
 BiGRU:

Bidirectional gated recurrent units
 BiLSTM:

Bidirectional long shortterm memory
 CNN:

Convolutional neural networks
 CNNBiGRU:

Convolutional neural networksbidirectional gated recurrent units
 CNNBiLSTM:

Convolutional neural networksbidirectional long shortterm memory
 CPI:

Consumer price index
 DBN:

Deep belief network
 EMD:

Empirical mode decomposition
 EEMDTCN:

Empirical mode decompositiontemporal convolutional network
 ELM:

Extreme learning machines
 ENSO:

El NiñoSouthern oscillation
 FNN:

Feedforward neural network
 GRU:

Gated recurrent units
 IEA:

International energy agency
 ISBM:

Improved slopebased method
 KNN:

kNearest neighbors
 LDA:

Latent Dirichlet allocation
 LSTM:

Long shortterm memory
 MAE:

Mean absolute error
 MAPE:

Mean absolute percentage error
 MEMD:

Multivariate empirical mode decomposition
 MIDAS:

Mixed data sampling
 MRN:

Multirecurrent network
 NLP:

Natural language processing
 OPEC:

Organization of the Petroleum Exporting Countries
 RMSE:

Root mean squared error
 RNN:

Recurrent neural networks
 SDAE:

Stacked denoising autoencoders
 SVM:

Support vector machines
 SVR:

Support vector regression
 T2V:

Time2Vector
 T2VBiGRU:

Time2Vector bidirectional gated recurrent units
 T2VBiLSTM:

Time2Vector bidirectional long shortterm memory
 TCN:

Temporal convolutional networks
 TCNBiGRU:

Temporal convolutional networksbidirectional gated recurrent units
 TCNBiLSTM:

Temporal convolutional networksbidirectional long shortterm memory
 US:

United States
 VAR:

Vector autoregressive
 VTFM:

Vector trend forecasting method
 WOA:

Whale optimization algorithm
 WPC:

World petroleum council
 WTI:

West Texas intermediate
References
Abdullah Ahmed R, Bin Shabri A (2014) Daily crude oil price forecasting model using Arima, generalized autoregressive conditional heteroscedastic and support vector machines. Am J Appl Sci 11(3):425–432
Adekoya OB, Akinseye AB, Antonakakis N, Chatziantoniou I, Gabauer D, Oliyide J (2022) Crude oil and Islamic sectoral stocks: Asymmetric TVPVAR connectedness and investment strategies. Resour Policy 78:102877
Akbar M, Iqbal F, Noor F (2019) Bayesian analysis of dynamic linkages among gold price, stock prices, exchange rate and interest rate in Pakistan. Resour Policy 62:154–164
Alameer Z, Elaziz MA, Ewees AA, Ye H, Jianhua Z (2019) Forecasting gold price fluctuations using improved multilayer perceptron neural network and whale optimization algorithm. Resour Policy 61:250–260
Almeida F, Xexéo G (2019) Word embeddings: a survey
Amirifar T, Lahmiri S, Zanjani MK (2023) An NLPdeep learning approach for product rating prediction based on online reviews and product features. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2023.3290558
Amirshahi B, Lahmiri S (2023a) Hybrid deep learning and GARCHfamily models for forecasting volatility of cryptocurrencies. Mach Learn Appl 12:100465
Amirshahi B, Lahmiri S (2023b) Investigating the effectiveness of Twitter sentiment in cryptocurrency close price prediction by using deep learning. Expert Syst. https://doi.org/10.1111/exsy.13428
Arbane M, Benlamri R, Brik Y, Alahmar AD (2023) Social mediabased COVID19 sentiment classification model using BiLSTM. Expert Syst Appl 212:118710
Baek C (2019) How are gold returns related to stock or bond returns in the U.S. market? Evidence from the past 10year gold market. Appl Econ 51(50):5490–5497
Bai Y, Li X, Yu H, Jia S (2022) Crude oil price forecasting incorporating news text. Int J Forecast 38(1):367–383
Balcilar M, Gabauer D, Umar Z (2021) Crude Oil futures contracts and commodity markets: new evidence from a TVPVAR extended joint connectedness approach. Resour Policy 73:102219
ben Khelifa S, Guesmi K, Urom C (2021) Exploring the relationship between cryptocurrencies and hedge funds during COVID19 crisis. Int Rev Financ Anal 76:101777
Bhowmik R, Wang S (2020) Stock market volatility and return analysis: a systematic literature review. Entropy 22(5):522
Boongasame L, Viriyaphol P, Tassanavipas K, Temdee P (2022) Goldprice forecasting method using long shortterm memory and the association rule. J Mob Multimedia 19(1):165–186
Borisov V, Leemann T, Seßler K, Haug J, Pawelczyk M, Kasneci G (2021) Deep neural networks and tabular data: a survey. IEEE Trans Neural Netw Learn Syst:1–21
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoderdecoder approaches
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling
Das S, Nayak J, Kamesh Rao B, Vakula K, Ranjan Routray A (2022) Gold price forecasting using machine learning techniques: review of a decade. Adv Intell Syst Comput Book Ser (AISC) 1349:679–695
Devlin J, Chang MW, Lee K, Google KT, Language, AI (2018) BERT: pretraining of deep bidirectional transformers for language understanding
Drachal K (2022) Forecasting the crude oil spot price with bayesian symbolic regression. Energies 16(1):4
Enwereuzoh PA, OdeiMensah J, Owusu Junior P (2021) Crude oil shocks and African stock markets. Res Int Bus Financ 55:101346
Fang T, Zheng C, Wang D (2023a) Forecasting the crude oil prices with an EMDISBMFNN model. Energy 263:125407
Fang Y, Wang W, Wu P, Zhao Y (2023b) A sentimentenhanced hybrid model for crude oil price forecasting. Expert Syst Appl 215:119329
Gharghory SM (2021) A hybrid model of bidirectional longshort term memory and CNN for multivariate time series classification of remote sensing data. J Comput Sci 17(9):789–802
Gono DN, Napitupulu H (2023) Silver price forecasting using extreme gradient boosting (XGBoost) method. Mathematics 11(18):3813
Gopali S, Abri F, SiamiNamini S, Namin AS (2021) A comparison of TCN and LSTM models in detecting anomalies in time series data. IEEE Int Conf Big Data 2021:2415–2420
Gruber N, Jockisch A (2020) Are GRU Cells more specific and LSTM Cells more sensitive in motive classification of text? Front Artif Intell 3
Guo J, Zhao Z, Sun J, Sun S (2022) Multiperspective crude oil price forecasting with a new decompositionensemble framework. Resour Policy 77:102737
He P, Liu X, Gao J, Chen W (2020) DeBERTa: decodingenhanced BERT with Disentangled Attention. International Conference on Learning Representations
He Z, Zhou J, Dai HN, Wang H (2019) Gold price forecast based on LSTMCNN model. In: 2019 IEEE international conference on dependable, autonomic and secure computing, pp 1046–1053
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780
Huang Y, Liu Q, Peng H, Wang J, Yang Q, OrellanaMartín D (2023) Sentiment classification using bidirectional LSTMSNP model and attention mechanism. Expert Syst Appl 221:119730
Hussain Shahzad SJ, Raza N, Shahbaz M, Ali A (2017) Dependence of stock markets with gold and bonds under bullish and bearish market states. Resour Policy 52:308–319
Jiang H, Hu W, Xiao L, Dong Y (2022) A decomposition ensemble based deep learning approach for crude oil price forecasting. Resour Policy 78:102855
Junttila J, Pesonen J, Raatikainen J (2018) Commodity market based hedging against stock market risk in times of financial crisis: the case of crude oil and gold. J Int Finan Markets Inst Money 56:255–280
Kazemi SM, Goel R, Eghbali S, Ramanan J, Sahota J, Thakur S, Wu S, Smyth C, Poupart P, Brubaker M (2019) Time2Vec: learning a vector representation of time
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146–3154
Kertlly de Medeiros R, da Nóbrega BC, Pitta de Jesus D, Phillipe de Albuquerquemello V (2022) Forecasting oil prices: new approaches. Energy 238:121968
Khan M, Wang H, Riaz A, Elfatyany A, Karim S (2021) Bidirectional LSTMRNNbased hybrid deep learning frameworks for univariate time series classification. J Supercomput 77(7):7021–7045
Kou G, Olgu Akdeniz Ö, Dinçer H, Yüksel S (2021) Fintech investments in European banks: a hybrid IT2 fuzzy multidimensional decisionmaking approach. Financ Innov 7:39
Kou G, Yüksel S, Dinçer H (2022) Inventive problemsolving map of innovative carbon emission strategies for solar energybased transportation investment projects. Appl Energy 311:118680
Lahmiri S (2023a) Multifractals and multiscale entropy patterns in energy markets under the effect of the COVID19 pandemic. Decis Anal J 7:100247
Lahmiri S (2023b) A comparative study of statistical machine learning methods for condition monitoring of electric drive trains in supply chains. Supply Chain Anal 2:100011
Lahmiri S, Bekiros S (2019) Cryptocurrency forecasting with deep learning chaotic neural networks. Chaos, Solitons Fractals 118:35–40
Lahmiri S, Bekiros S (2020) Intelligent forecasting with machine learning trading systems in chaotic intraday Bitcoin market. Chaos, Solitons Fractals 133:109641
Lahmiri S, Bekiros S (2021) Deep learning forecasting in cryptocurrency highfrequency trading. Cogn Comput 13:485–487
Lahmiri S, Bekiros S, Avdoulas C (2023) A comparative assessment of machine learning methods for predicting housing prices using Bayesian optimization. Decis Anal J 6:100166
Lahmiri S, Bekiros S, Bezzina B (2022) Complexity analysis and forecasting of variations in cryptocurrency trading volume with support vector regression tuned by Bayesian optimization under different kernels: an empirical comparison from a large dataset. Expert Syst Appl 209:118349
LaraBenítez P, CarranzaGarcía M, LunaRomera JM, Riquelme JC (2020) Temporal convolutional networks applied to energyrelated time series forecasting. Appl Sci 10(7):2322
Lea C, Flynn MD, Vidal R, Reiter A, Hager G.D (2016) Temporal convolutional networks for action segmentation and detection
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradientbased learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li G, Yin S, Yang H (2022a) A novel crude oil prices forecasting model based on secondary decomposition. Energy 257:124684
Li T, Kou G, Peng Y, Yu PS (2022b) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cybern 52(12):13848–13861
Li X, Shang W, Wang S (2019) Textbased crude oil price forecasting: a deep learning approach. Int J Forecast 35(4):1548–1560
Li Y, Du N, Bengio S (2017) Timedependent representation for neural event sequence prediction
Liang X, Luo P, Li X, Wang X, Shu L (2023) Crude oil price prediction using deep reinforcement learning. Resour Policy 81:103363
Lim B, Zohren S (2021) Timeseries forecasting with deep learning: a survey. Philos Trans R Soc Math Phys Eng Sci 379(2194):20200209
Lin Y, Chen K, Zhang X, Tan B, Lu Q (2022) Forecasting crude oil futures prices using BiLSTMAttentionCNN model with Wavelet transform. Appl Soft Comput 130:109723
Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a robustly optimized BERT pretraining approach
Lu W, Li J, Li Y, Sun A, Wang J (2020) A CNNLSTMbased model to forecast stock prices. Complexity 2020:1–10
Madziwa L, Pillalamarry M, Chatterjee S (2022) Gold price forecasting using multivariate stochastic model. Resour Policy 76:102544
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality
Mohamed NA, Messaadia M (2023) Artificial intelligence techniques for the forecasting of crude oil price: a literature review. In: International conference on cyber management and engineering (CyMaEn), pp 340–343
Murshed M, Tanha MM (2021) Oil price shocks and renewable energy transition: empirical evidence from net oilimporting South Asian economies. Energy Ecol Environ 6(3):183–203
Orojo O, Tepper J, McGinnity TM, Mahmud M (2019) A multirecurrent network for crude oil price prediction. In: IEEE symposium series on computational intelligence (SSCI)
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Periwal A (2023) The impact of crude oil price fluctuations on Indian economy. Int J Res Appl Sci Eng Technol 11(4):3173–3202
Phan DHB, Sharma SS, Narayan PK (2016) Intraday volatility interaction between the crude oil and equity markets. J Int Finan Markets Inst Money 40:1–13
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst 31
Pullen T, Benson K, Faff R (2014) A Comparative analysis of the investment characteristics of alternative gold assets. Abacus 50(1):76–92
Qin Q, Huang Z, Zhou Z, Chen C, Liu R (2023) Crude oil price forecasting with machine learning and Google search data: an accuracy comparison of singlemodel versus multiplemodel. Eng Appl Artif Intell 123:106266
Qin Q, Xie K, He H, Li L, Chu X, Wei YM, Wu T (2019) An effective and robust decompositionensemble energy price forecasting paradigm with local linear prediction. Energy Econ 83:402–414
Raza S, Schwartz B (2023) Entity and relation extraction from clinical case reports of COVID19: a natural language processing approach. BMC Med Inform Decis Mak 23(1):20
Reboredo JC (2013) Is gold a safe haven or a hedge for the US dollar? Implications for risk management. J Bank Finance 37(8):2665–2676
Risse M (2019) Combining wavelet decomposition with machine learning to forecast gold returns. Int J Forecast 35(2):601–615
Salisu AA, Ogbonna AE, Adewuyi A (2020) Google trends and the predictability of precious metals. Resour Policy 65
Sarwar S, Shahbaz M, Anwar A, Tiwari AK (2019) The importance of oil assets for portfolio optimization: the analysis of firm level stocks. Energy Econ 78:217–234
SiamiNamini S, Tavakoli N, Namin AS (2019) The Performance of LSTM and BiLSTM in forecasting time series. IEEE Int Conf Big Data 2019:3285–3292
Sroka Ł (2022) Applying block bootstrap methods in silver prices forecasting. Econometrics 26(2):15–29
Su M, Liu H, Yu C, Duan Z (2022) A new crude oil futures forecasting method based on fusing quadratic forecasting with residual forecasting. Digital Signal Process 130:103691
Sun J, Zhao P, Sun S (2022) A new secondary decompositionreconstructionensemble approach for crude oil price forecasting. Resour Policy 77:102762
Swamy V, Lagesh MA (2023) Does happy Twitter forecast gold price? Resour Policy 81:103299
Szarek D, Bielak Ł, Wyłomańska A (2020) Longterm prediction of the metals’ prices using nonGaussian timeinhomogeneous stochastic process. Phys A Stat Mech Appl 555
Tang L, Zhang C, Li L, Wang S (2020) A multiscale method for forecasting oil price with multifactor search engine data. Appl Energy 257:114033
UzoPeters A, Laniran T, Adenikinju A (2018) Brent prices and oil stock behaviors: evidence from Nigerian listed oil stocks. Financ Innov 4(1):8
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need
Vidal A, Kristjanpoller W (2020) Gold volatility prediction using a CNNLSTM approach. Expert Syst Appl 157(1348):1
Wang J, Athanasopoulos G, Hyndman RJ, Wang S (2018) Crude oil price forecasting based on internet concern using an extreme learning machine. Int J Forecast 34(4):665–677
Wang J, Niu T, Du P, Yang W (2020) Ensemble probabilistic prediction approach for modeling uncertainty in crude oil price. Appl Soft Comput J 95:106509
Wang L, Ma F, Niu T, Liang C (2021) The importance of extreme shock: examining the effect of investor sentiment on the crude oil futures market. Energy Econ 99:105319
Xiuzhen X, Zheng W, Umair M (2022) Testing the fluctuations of oil resource price volatility: a hurdle for economic recovery. SSRN Electron J
Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2021) A Temporal kernel approach for deep learning with continuoustime information
Xu D, Ruan C, Kumar S, Korpeoglu E, Achan K (2019) Selfattention with functional time representation learning
Yan J, Mu L, Wang L, Ranjan R, Zomaya AY (2020) Temporal convolutional networks for the advance prediction of ENSO. Sci Rep 10(1):8055
Yang M, Li X, Liu Y (2021) Sequence to point learning based on an attention neural network for nonintrusive load decomposition. Electronics 10(14):1657
Yang M, Wang J (2022) Adaptability of financial time series prediction based on BiLSTM. Procedia Comput Sci 199:18–25
Yang S, Chen D, Li S, Wang W (2020) Carbon price forecasting based on modified ensemble empirical mode decomposition and long shortterm memory optimized by improved whale optimization algorithm. Sci Total Environ 716:137117
Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270
Yuan Z (2023) Gold and bitcoin price prediction based on KNN, XGBoost and LightGBM model. Highlights Sci Eng Technol 39:720–725
Zhang P, Ci B (2020) Deep belief network for gold price forecasting. Resour Policy 69:101806
Zhang S, Chen Y, Zhang W, Feng R (2021) A novel ensemble deep learning model with dynamic error correction and multiobjective ensemble pruning for time series forecasting. Inf Sci 544:427–445
Zhang Y, Wang J, Yu L, Wang S (2022a) An extreme biaspenalized forecast combination approach to commodity price forecasting. Inf Sci 615:774–793
Zhang Z, He M, Zhang Y, Wang Y (2022b) Geopolitical risk trends and crude oil price predictability. Energy 258:124824
Zhao L, Cheng L, Wan Y, Zhang H, Zhang Z (2015) A VARSVM model for crude oil price forecasting. Int J Glob Energy Issues 38(1/2/3):126
Zhao LT, Wang Y, Guo SQ, Zeng GR (2018) A novel method based on numerical fitting for oil price trend forecasting. Appl Energy 220:154–163
Zhao Y, Li J, Yu L (2017) A deep learning ensemble approach for crude oil price forecasting. Energy Econ 66:9–16
Zhou S, Wu JN, Wu Y, Zhou X (2015) Exploiting local structures with the kronecker layer in convolutional networks
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
PF: Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing original draft, writing review and editing. SL: Conceptualization, methodology, and review. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Foroutan, P., Lahmiri, S. Deep learning systems for forecasting the prices of crude oil and precious metals. Financ Innov 10, 111 (2024). https://doi.org/10.1186/s4085402400637z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4085402400637z