Take Bitcoin into your portfolio: a novel ensemble portfolio optimization framework for broad commodity assets

The emergence and growing popularity of Bitcoins have attracted the attention of the financial world. However, few empirical studies have considered the inclusion of the newly emerged commodity asset in the global commodity market. It is of great importance for investors and policymakers to take advantage of this asset and its potential benefits by incorporating it as a part of the broad commodity trading portfolio. In this study, we propose a novel ensemble portfolio optimization (NEPO) framework utilized for broad commodity assets, which integrates a hybrid variational mode decomposition-bidirectional long short-term memory deep learning model for future returns forecast and a reinforcement learning-based model for optimizing the asset weight allocation. Our empirical results indicate that the NEPO framework could effectively improve the prediction accuracy and trend prediction ability across various commodity assets from different sectors. In addition, it could effectively incorporate Bitcoins into the asset pool and achieve better financial performance compared to traditional asset allocation strategies, commodity funds, and indices.

hedging and diversification strategy, which benefits from different seasonal cycles and supply and demand factors (Geman and Ohana 2008).
Most investments in broad commodity assets typically focus on traditional products, such as crude oil, agricultural products, and precious metals. However, recent years have seen the emergence of a new type of commodity asset-cryptocurrency-which has gained the attention of investors. Bitcoin, the most popular cryptocurrency, has seen a sharp rise in its price from almost zero in 2009 to approximately $60,000 in 2021. Due to its extreme price disturbances observed during the latter half of 2019, Bitcoin has been considered a threat to the stability of the world financial system; however, its unique economic properties have made it an attractive and potentially high-return investment option (das Neves 2020; Jiang et al. 2021). As there is a finite number of Bitcoins in circulation, the ever-decreasing supply of the asset available for buying and selling has driven a growing number of institutional investors to embrace this cryptocurrency as an investment option. For example, Bitcoin's price resurgence in 2021 was partly fueled by the Wall Street billionaires who publicly supported and invested in the asset. 1 However, due to its high volatility, most investors are hesitant to solely invest in this asset. Instead, many trading firms seek to incorporate it into their portfolio along with other traditional commodities to hedge against its potential volatility risks (Liu and Tsyvinski 2018).
Bitcoin's commodity properties have been investigated by some studies. For example, using a conditional correlation model, Bouri et al. (2017) suggest that Bitcoin can serve as a safe haven for other major commodities in the global commodity market system. Selmi et al. (2018) use the quantile regression to investigate the economic characteristics of Bitcoin, indicating that it is both a hedge and safe haven for oil price movements. As a result, Bitcoin is considered the "new gold" for its safe haven properties, which are similar to that of gold and serve as a potential hedge or safe haven asset for finical portfolio optimization (Selmi et al. 2018;Symitsi and Chalvatzis 2019). However, limited studies investigate the effects and usefulness of a cryptocurrency in portfolio investments. Therefore, this study attempts to close this gap by considering Bitcoin and its diversification properties in the development of a broad commodity portfolio optimization system based on deep learning and reinforcement learning.
The literature has sought to improve portfolio performance using various optimization methods and models. Conventional portfolio optimization models, such as the meanvariance, risk parity, and Black-Litterman models, utilize the historical returns and variances of financial assets to derive the maximized Sharpe ratio or efficient frontier of the portfolio (Kou et al. 2021). However, a potential problem with such approaches lies in the discrepancy between historical and future prices, which may lead to estimation errors, eventually generating non-optimal proportions of the target portfolio (Guastaroba et al. 2009;Tola et al. 2008). To address this issue, algorithmic optimization approaches based on data-driven techniques have been introduced for financial time series data prediction and portfolio decision-making in recent studies (Branke et al. 2009;Lwin et al. 2014). Although improvements have been made, the current algorithmic methods for portfolio optimization face two primary challenges: improving the directional accuracy (DA) of multi-asset return predictions and implementing multi-objective portfolio optimizations.
With the development of computer technologies, deep learning prediction models have been introduced to improve the financial time series data forecasting accuracy. Compared to traditional techniques based on econometrics and machine learning techniques that are unable to perform well on forecasting multivariate time series data due to noise disturbances (Altan et al. 2019;Galankashi et al. 2020;Jalali and Heidari 2020), deep learning techniques are observed to be more effective. For example, Atsalakis et al. (2019) introduce a novel Neuro-fuzzy technique with artificial neural networks (ANN) to forecast the market trends in cryptocurrency prices, which show an improvement in prediction accuracy compared to traditional prediction methods. Dutta et al. (2020) employ a gated recurring unit model to predict the price movements of Bitcoins and achieve a better forecasting performance. Long et al. (2019) propose a multi-filter neural network for stock price prediction. Further, Li et al. (2019) develop a crude oil price prediction system based on the convolutional neural networks. Among all the deep learning techniques, the recurrent neural network (RNN) models have displayed a superior performance over others in terms of time series prediction accuracy (Duan et al. 2016). Such a superior performance may be attributed to the recurrent feedback layer of RNN models, which allows them to effectively use internal memories to process input data sequentially and produce more precise forecasts (Cao et al. 2012;Anbazhagan and Kumarappan 2012).
However, as most commodity market prices are volatile and non-stationary, the forecasting performances may be negatively affected because of high volatilities. In recent years, a hybrid forecasting approach known as "decomposition and ensemble" has been proposed to improve the prediction accuracy of non-stationary time series with high complexity and irregularity. The decomposition and ensemble approach is based on the principle of "divide and conquer, " which integrates signal decomposition technology, such as empirical mode decomposition, ensemble empirical mode decomposition, or variational mode decomposition (VMD), with machine learning and deep learning models (Yang et al. 2019;Wang et al. 2018). In this approach, the original prediction task is divided into subtasks to simplify the modeling difficulty ). Compared to other models, this approach is not bound by strict assumptions such as that of linearity and stationarity, which are imposed on econometric models.
Numerous studies have employed this hybrid forecasting approach and demonstrated its effectiveness in improving the time series forecasting performance (Yu et al. 2015;Wen et al. 2017). Despite the improved prediction performance, there might be a potential problem with the decomposition and ensemble approach. Estimation errors generated while forecasting the individual sub-modes tend to accumulate during the aggregation process. This accumulation may cause significant discrepancies between the actual and predicted values, which could negatively affect the prediction performance (Zhu et al. 2019). Therefore, we propose a new hybrid deep learning-based forecasting approach in our portfolio optimization framework to mitigate this problem. Unlike previous studies, our forecasting approach eliminates the aggregation step by directly generating the final prediction results using all the intrinsic modes as inputs simultaneously, which can potentially reduce the aggregation errors.
The literature has adopted many portfolio models, such as the cardinality constrained model (Zha et al. 2020), fuzzy selection model (Yue 2019), and Powell approaches (Powell 1964), to address the second challenge of implementing multi-objective portfolio optimizations. Among all the portfolio optimization models, reinforcement learning models are considered to be the most appropriate for financial portfolio optimization. For example, Jangmin et al. (2006) introduce a reinforcement learning framework for asset allocation optimization by using meta policy as a reinforcement learning strategy to optimize stock portfolios, which is designed to incorporate the information obtained from the ratio of the stock fund and stock recommendations. Jeong and Kim (2019) use a Deep Q Network-based reinforcement learning model to improve the prediction and trading performance of stock markets. Q-values are utilized to analyze which portfolio action strategies are beneficial for improving profits in a confused market. Eilers et al. (2014) develop a novel integrated robust artificial neural network reinforcement learning (ANN-RL) model to filter the seasonality of financial assets, where the Sharpe ratio is introduced to act as network rewards in the reinforcement learning process.
However, despite contributing toward improving the market returns and their associated risks, previous reinforcement learning-based portfolio optimization frameworks have two limitations. First, previous models typically consider a discrete action space, implying that there are only a fixed number of portfolio weight allocations from which the model can choose. However, the portfolio weight allocation is a continuous action space in reality, as each asset can be potentially given any weight between 0 and 100%. Therefore, although the portfolio performances have improved, the previously implemented model may have ignored the possibility of other allocations that do not exist in the pre-determined action space. To address this limitation and improve the model's consistency, we develop a portfolio optimization framework based on a deep deterministic policy gradient model (Lillicrap et al. 2015) that can effectively consider the continuous characteristics of the weight allocation action space. In addition, the previously employed portfolio optimization models typically allocate asset weights to optimize the portfolio performance directly based on the forecasted trends of the assets without considering the potential prediction errors. Hence, these models assume that the forecasted value is entirely accurate, which is unrealistic in practice. Therefore, our portfolio optimization framework attempts to address this issue by considering the prediction errors that may arise in the forecasting process, which can potentially allocate weights more effectively to improve the portfolio performance.
Based on previous studies, we propose a novel ensemble portfolio optimization (NEPO) framework utilized for broad commodity assets. First, a non-recursive decomposition approach through VMD is utilized to decompose the daily closing price data of the selected commodity assets into distinctive intrinsic modes in order to extract the additional hidden information and patterns in time series data. Second, the decomposed intrinsic modes of each asset are inserted into a bidirectional long short-term memory (BiLSTM) deep learning model to forecast the daily closing price and return of the asset. Compared with the typical unidirectional deep learning model, the proposed prediction model can extract a two-way sequential relationship in time series data, making it more consistent with reality (Ullah et al. 2017). Additionally, unlike other decomposition and ensemble forecasting approaches, our proposed price prediction model eliminates the aggregation step by generating the forecasting results directly through the simultaneous input of all the extracted intrinsic modes into the deep learning model. Third, the predicted returns of the asset as well as the estimation errors are included in a reinforcement learning-based optimizer to allocate optimal weights for the commodity assets in the portfolio. Several prediction models, such as machine learning and deep learning models, are introduced as benchmarks to assess the forecasting performance of our decomposition-based bidirectional deep learning model. The empirical results suggest that the proposed VMD-BiLSTM model can effectively improve the prediction accuracy and trend prediction ability across various commodity assets.
We compare the performance of our portfolio optimizer to that of other commodity funds, indices, and asset allocation strategies in terms of an annualized return and Sharpe ratio. The empirical results indicate that our ensemble portfolio optimization framework can generate higher returns and a better Sharpe ratio than the others. The results also indicate that by including Bitcoins in the commodity portfolio, asset managers can achieve higher returns without being exposed to significant financial risks. We find that it is possible to take advantage of the returns generated from Bitcoins while reducing the investment risks caused by its extreme volatilities. Overall, employing the proposed ensemble portfolio optimization framework and considering Bitcoin a traditional commodity portfolio can generate better fund performances for asset management and portfolio profits for commodity investors.
Our study's primary contributions are as follows. First, we extend the broad commodity asset pool for potential diversification premiums by utilizing the economic and investment properties of Bitcoin by incorporating it into the investment portfolio. To the best of our knowledge, Bitcoin has not been considered a portfolio component in portfolio optimization problems; we aim to fill this literature gap through this analysis. Second, the proposed ensemble portfolio optimization framework allows the asset weights to be allocated in a continuous action space while considering the prediction errors generated in the optimization process. Compared to the portfolio optimization models in previous studies, our proposed model is more practical and consistent with reality.

Methodological framework
Our NEPO framework comprises three main components: an effective decomposition technique, VMD, is used to decompose the original time series of all the commodities and extract the inner patterns of the data; the extracted inner factors of the different commodities are then incorporated into the BiLSTM neural networks to predict their five-day returns; finally, reinforcement learning is applied to optimize and re-balance the portfolio weights based on the predicted returns and forecasting evaluation metrics. The detailed framework is shown in Fig. 1.

VMD
VMD decomposes the original complex and non-stationary time series data into normally distributed stationary volatility data, thereby generating economic implications. This non-recursive signal decomposition technique was proposed by Dragomiretskiy and Zosso (2014). It decomposes the original input signal f (t) into a series of quasi-orthogonal band-limited discrete sub-signals u k through Wiener filtering and Hilbert transform (Wang and Markert 2015). The decomposed sub-signals u k are mostly centered tightly around their respective center frequency ω k (Liu et al. 2016). The optimization procedure is as follows (Zhang et al. 2017):

Fig. 1 Portfolio optimization and NEPO framework
Step 1: Calculate the Hilbert transform of each mode u k and transform it into its respective uni-sided frequency spectrum.
Step 2: The frequency spectrum of each mode u k is altered into a narrow frequency baseband by multiplying an exponential function tuned to the corresponding estimated center frequency.
Step 3: Obtain the bandwidth of each mode u k by conducting the H 1 Gaussian smoothness on the demodulated signal.
The iterative minimization process seeks to minimize the total bandwidth of each mode, which can be expressed in the following form: where K denotes the number of decomposed modes, {u k } and {ω k } are the decomposed modes and their respective center frequencies, δ(t) denotes the Dirac delta function, ⊗ represents the convolution operator, and f (t) denotes the original input signal.
For finite convergence and constraint enforcement, a quadratic penalty function α and Lagrangian multiplier are introduced to obtain the optimal solution of the constrained optimization problem provided in Eq. (2). The augmented Lagrangian multiplier function L can be obtained as follows: The optimal solution is obtained using the alternative direction method of multipliers (Hestenes 1969), while the original input signal f (t) is decomposed into K subsignal modes.

BiLSTM neural networks
The bidirectional RNN was proposed by Schuster and Paliwal (1997). It utilizes both forward and backward information in the data. As illustrated in Fig. 2, the bidirectional RNN structure contains two unidirectional hidden layers, where one layer processes information from the forward direction and the other from the backward direction. The forward and backward unidirectional layers are concatenated to one output layer, such that the neural networks can extract bidirectional sequential relationships in the time series data. Compared to traditional unidirectional neural networks, it can preserve information from both the past and future.
In our prediction model, we replace the traditional RNN cells with LSTM cells, considering their ability to learn long-term dependencies ). At each time step t , an LSTM cell consists of an input gate i t , a forget gate f t , an output gate o t , and a memory cell block C t . f t and i t are defined as follows: (1) min A tanh layer is utilized to generate a new memory cell block C t . The existing memory cell block C t is then updated, while the output gate o t and hidden state h t are generated: where x t denotes the input at time t , σ represents the sigmoid function, and * is the element-wise multiplication. W and b are the respective weight matrices and bias vectors.

Reinforcement learning
This study uses the predicted returns generated from the BiLSTM neural networks and integrates them into a reinforcement learning model to optimize and re-balance the weights of the portfolio. The set of agent states S represents the previous weight allocation, and the set of agent actions A t denotes the possible set of portfolio allocations. The probability of the reinforcement learning model selecting an action (a portfolio weight allocation) a in state s can be expressed as follows: The state spaces contain all the possible allocation of portfolio weights, while the actions are the set of possible allocations from state spaces. At each time step t , the state s t and action a t can be expressed as follows: ( (10) π (a|s) = Pr(a t = a|s t = s). where w i,t (i = 1, . . . n; t = 1, . . . T ) denotes the allocated portfolio weight for commodity i at time t.
We further define the reward function at time t , which is denoted as r t , as the difference between the reward for the newly allocated portfolio weights and the previous portfolio weights: Although a set of portfolio management targets and indicators, such as the Omega and Sortino ratios, are available for portfolio optimization, the Sharpe ratio is the most widely utilized indicator and serves as the baseline of portfolio ratio improvements in academia and industry (Farinelli et al. 2008;Kapsos et al. 2014). Q t and Q t in Eq. (12) denote the weighted Sharpe ratio (Sharpe 1994) of the newly allocated portfolio weights and the portfolio weights observed in previous studies, respectively. RMSE t and RMSE t represent the weighted root mean squared error (RMSE) of the prediction models for the new portfolio weights and the weights computed in previous studies, respectively. The reinforcement learning model is trained to find a set of portfolio weight allocations that will maximize the expected return: After every period (five days), new commodity prices are included in the prediction model to generate the predicted returns for the next period. Based on the new prediction values, portfolio performance, and weights from the previous period, the reinforcement learning model can optimize and re-adjust the portfolio weights for the next period. This model is designed to self-adjust and find optimized portfolio allocations with the least human participation. In particular, this self-learning procedure can effectively find a balance between maximizing the portfolio returns and minimizing the risks of the portfolio generated from the errors in the forecasting model.

Empirical study
Our analysis consists of two parts. First, the VMD-BiLSTM models first predict the fiveday prices and returns for the chosen commodities based on their respective historical time series data. Second, the reinforcement learning model considers the prediction results and allocates the optimal portfolio weights for each predicted period accordingly.

Description of the dataset
We select five major commodity markets to construct our commodity portfolio, which consists of stocks, agriculture, energy, precious metal, and the newly emerged (11) a t = w t = w 1,t , . . . , w n,t cryptocurrency commodities. In the portfolio, each market is represented by its leading commodity, which includes the S&P 500 stock index, wheat, WTI crude oil, gold index, and Bitcoin. The data are obtained from Yahoo Finance, from which we download the daily closing price of the S&P 500 index, wheat, WTI crude oil, gold, and Bitcoin from January 2, 2013 to February 21, 2020, obtaining 1797 observations. As Bitcoin is traded continuously throughout the day, its opening price generally refers to the price at 12:01 AM UTC and the closing price to that at 11:59 PM UTC on any given day. A graphical representation of the data for each commodity is provided in Fig. 3.
The common descriptive statistics for the commodity time series data are presented in Table 1. The daily closing prices of all the commodities are non-normal and positively skewed (right skewed). In addition, the augmented Dickey-Fuller test indicates that the time series data for the stock (SPY), crude oil (WTI), gold, and Bitcoin are all non-stationary and have a unit root. The null hypothesis of the augmented Dickey-Fuller test for wheat is not rejected at the 10% level of statistical significance, which means that the series is stationary.
Moreover, this study conducts a correlation analysis among the commodities. The Pearson correlation coefficients are shown in Fig. 4. The results suggest that there exist moderately weak non-linear relationships among the commodities. In particular, Bitcoin is negatively correlated with all the other commodities. In addition, there exists a strong positive correlation between the stock commodity (SPY) and the energy commodity (WTI).
We divide the data into two sets: a training set and a testing set with a split ratio of 8:2, which means that the preceding 80% of the data are used to train the prediction model, while the remainder are used to evaluate the model. We use a sliding input of 14 days in the prediction process, which means that the model considers the historical data from t − 13 to t to forecast the five-day-ahead closing price at t + 5 . Therefore, our training set consists of 1421 observations from January 30, 2013 to September 19, 2018, while the testing set consists of 356 observations from September 20, 2018 to February 21, 2020.
To eliminate the differences in the variable dimension and increase model forecasting reliability, we normalize the data in the range of [0,1] as shown below: where x t denotes the true value of the time series at time t , while max x t and min x t are the maximum and the minimum true values of the time series, respectively.
After the normalized closing prices are predicted, they are converted to predicted returns as follows: where r t+5 and p t+5 denote the predicted returns and predicted closing price at time t + 5 , respectively. Here, p t represents the actual closing price at time t.
The constructed prediction model used in this study consists of five layers: an input layer, two hidden layers in the forward and backward direction, an output layer, and a fully connected layer. The dimensions of the input layer, hidden layers, and output layer are same as that of the input data. The dimension of the fully connected layer is set to one to represent the single final predicted output. We use the Adam optimizer with the learning rate (LR) set to 0.01 with tanh as the activation function. We adopt a rolling forecast process where the rolling window is set to 90-days. The rolling process is illustrated in Fig. 5.

Evaluation measures
To assess the accuracy of our forecasting models, we adopt the mean square error (MSE) as the loss function. It is calculated as follows: where x t and x t (t = 1, 2, . . . , N ) are the predicted and actual true values at time t , while N represents the total number of data points in the testing set.
In addition, we introduce DA as the metric to assess the market trend predictive ability of the model: where For the forecasting model, lower MSE, RMSE, and MAE values and larger DA values indicate that the model has higher predictive accuracy and a stronger ability to predict the market trend. For benchmarking purposes, we compare the performance of our decomposition-based VMD-BiLSTM model against four other benchmark models, including the BiLSTM, unidirectional LSTM, support vector regression, and linear regression models.
To assess the performance of the portfolio constructed by our reinforcement learning model, we compare the average five-day returns of our portfolio and the overall Sharpe ratio against that of other portfolios and the reported financial performance from similar commodity indices and funds. The Sharpe ratio is defined as follows: where r p denotes the annualized return of the portfolio, σ p is the annualized volatility of the portfolio, and r f represents the nominal risk-free rate. As suggested in previous studies (Fabozzi et al. 2007;Ackerman et al. 2013), we set the nominal risk-free rate r f = 2%. We consider several other portfolios, including the equal-weighted portfolio (the five chosen commodities are allocated equal weights throughout the trading period) and non-Bitcoin portfolio (only SPY, wheat, WTI, and gold are considered), for comparing performance. We also obtain the financial performance data from similar exchange traded funds (ETF) for comparison, which includes the broad commodity ETF and Bitcoin ETF. The financial data for these ETFs for the trading period ranging from September 14, 2018 to February 21, 2020 are all downloaded from Yahoo Finance.

Return prediction results
First, we decompose the original historical price time series for all the commodities in our portfolio via VMD. According to the literature, VMD can effectively help neural networks to capture the tendency and cyclicity of time series data. For our analysis, the historical data for each selected commodity are decomposed into their respective subseries modes as shown in Fig. 6.
As we can see from Fig. 6, the historical daily closing price data for each commodity are decomposed into 11 sub-series modes labeled from M1 to M11, respectively. For each commodity, their decomposed modes display different cyclicity and fluctuation patterns. The sub-series modes range from low frequency to high frequency. Specifically, the M1 modes have the lowest frequency, reflecting the long-term trends for the time series. However, M2 to M5 represent the medium frequency modes, which show the periodicity of the price fluctuation. Lastly, M6 to M11 are the high-frequency modes, which represent the short-term fluctuations in the data.
By decomposing the historical price time series, we can extract the inner factors and patterns in each commodity. In general, these inner factors may contain hidden information that can influence the price fluctuation of the commodity (Wang et al. 2014), which cannot be captured with the original data. Consequently, this ability to extract hidden fluctuations and patterns can improve the forecasting ability of the prediction models.
In the prediction step, the historical daily closing prices of each commodity and the decomposed sub-series modes are included in the prediction model to generate its fiveday ahead predicted closing price that is further converted into its predicted return. This procedure is repeated for all the five commodities in our portfolio, while considering the same model settings to ensure consistency.
Observing the prediction performances of all the commodities displayed in Table 2, we can conclude that our VMD-BiLSTM model is the most suitable for generating the most reliable forecasting results as compared to other benchmark models.
After analyzing each commodity, we find that the VMD-BiLSTM model displays the most drastic improvements in RMSE and MAE performances in the case of the benchmark models. The VMD-BiLSTM model obtains the highest DA out of all the models. This superior performance indicates that our prediction model can effectively capture and forecast the movement trend for all the selected commodity markets. Moreover, the high prediction accuracies achieved by our VMD-BiLSTM model across all the commodities indicate that it is not overfitted to a particular dataset. Thus, it could be generalized for all commodity markets. Although our VMD-BiLSTM model can significantly improve the forecasting performance, its predictive accuracy differs for each commodity. Specifically, the model displays the highest prediction accuracy in the gold commodity market by achieving the lowest RMSE and MAE and a relatively high DA of 93%. In contrast, it obtains the highest RMSE and MAE and the lowest DA of 85.4% in the Bitcoin commodity market. This difference in prediction accuracies could be attributed to the fact that the selected commodities have varied volatilities. For example, gold is regarded as an investment safe haven due to its relatively low volatilities (Baur and McDermott 2010); in contrast, Bitcoin is known to be a volatile asset as its prices can fluctuate significantly. As discrepancies in prediction accuracy exist among different commodities, we must consider them when building portfolio allocation strategies based on the predicted values.

Robustness tests
To further verify the robustness of our prediction, we test the prediction model using different combinations of neural network hyperparameters. The hyperparameter sets contain two components: the dimension of the hidden layers [11,22,33] and LR [0.001, 0.01, 0.1]. The results for all the datasets and commodities are presented in Table 3. The results in Table 3 indicate that the different hyperparameter combinations can yield varied model prediction accuracies, where the bold fonts present the prediction results of 11 dimension hidden layers and 0.01 LR settings for each market. It is clear that when the dimension of the hidden layers is set to 11 and LR is set to 0.01, our prediction model obtains the best prediction results. Further, the consistently superior results across all the commodities indicate that our model is not overfitted to a particular dataset.
To further verify the robustness of our prediction model, we use the same prediction model with the best hyperparameter settings on four different datasets that include the first 95%, 90%, 85%, and 80% of the original time series, which are denoted as "Set 95, " "Set 90, " "Set 85, " and "Set 80, " respectively. For each dataset, the split ratio is set to 8:2. The test results for all the datasets and commodities are presented in Table 4.
The results in Table 4 indicate that our VMD-BiLSTM prediction model displays a consistent and good performance across all the commodities in different datasets in terms of RMSE, MAE, and DA. This indicates that our prediction model can consistently predict the future prices of different commodities across various market conditions, implying that our prediction model is robust and generalizable across different commodity markets.

Portfolio optimization results
After obtaining the prediction results from the VMD-BiLSTM model, we use them as the input to construct our commodity portfolios. In this analysis, we apply a deep deterministic gradient policy reinforcement learning model to optimize asset allocation automatically every five days. After obtaining the allocation weights in each portfolio, we calculate the actual annualized returns, volatility, and Sharpe ratio of the portfolios using real-time commodity prices. To further verify the practicality of our strategy in the real world, we consider the transaction fees of each commodity, which are collected from Yahoo Finance. To evaluate the performance of each selected portfolio, we divide the entire trading interval into quarters (four months). In our trading policy and simulation, we ensure that our assets are sufficiently large to cover the trading volumes for all commodities. As a result, the initial investment capital for each trading strategy is set at $10,000. The investment asset and indicator comparisons are separately illustrated in Fig. 7 and Table 5.
As Bitcoin is a relatively new commodity asset, it has not been considered a portfolio component in portfolio optimization problems by previous studies. To investigate its diversification properties and effects in a portfolio, we construct two portfolios using our NEPO framework: the extended broad commodity asset (EBCA) portfolio and the traditional broad commodity asset (TBCA) portfolio. The EBCA portfolio contains all the selected commodity assets (SPY, wheat, WTI, gold, and Bitcoin). In contrast, the    Table 5 Quarterly portfolio performances of each strategy To conduct a clear performance comparison, the indicators in Table 3   The results in Table 5 and Fig. 7 show that our portfolio constructed using the reinforcement learning model outperformed the other portfolio, indices, and funds in terms of financial performances for all the trading periods in the analysis. First, our constructed EBCA and TBCA portfolios are unique in that they can maintain consistent positive returns throughout all the trading intervals. In certain intervals, such as 12/2018-02/2019, 06/2019-08/2019, 09/2019-11/2019, and 12/2019-02/2020, the other indices and funds experienced negative returns because most of the commodities in their portfolios experienced a decrease in their price. In comparison, our reinforcement learning model uses the predicted returns to optimally allocate weights to maximize the Sharpe ratio of the portfolios.
Second, in comparison with DJCI and FTGC funds, our traditional commodity TBCA portfolio records a higher average volatility at 23.09%. This higher level of risk is attributed to the common diversification knowledge (Imbs and Wacziarg 2003;Guesmi et al. 2019). As the selected indices and funds often consist of a large number of commodities, the risks of their portfolios are generally more diversified. In comparison, despite the higher volatilities, our TBCA portfolio is sufficiently diverse for individual investors as it contains four commodity assets across different sectors. Further, our portfolio yields significantly higher returns.
Third, the performance of our extended EBCA portfolio indicates that Bitcoin can yield better results when it is treated as a part of the portfolio rather than as a standalone investment. Bitcoin is more volatile than other traditional commodities (Garcia et al. 2014;Yu et al. 2019). As a stand-alone investment, although it presents attractive returns in certain periods, its extreme volatilities make it a risky asset. For example, the GBTC ETF obtains a return of 113.91% in 03/2019-05/2019, while the portfolio volatility attains 103.04%. As a result, this "high risk, high reward" characteristic of Bitcoin exposes investors to significant risks. Compared with the GBTC funds, our EBCA portfolio increases the average returns by 78% from 19.48 to 34.67% and significantly reduces the portfolio risk from 90.12 to 24.27%. By incorporating Bitcoin as part of our constructed portfolio, we can take advantage of the attractive returns of Bitcoin while limiting the exposure to the risks of the asset, which may be a viable strategy for individual investors.
The results in Table 5 also show that our EBCA portfolio outperformed the TBCA traditional commodity portfolio in four of six intervals by achieving better Sharpe ratios. Overall, the EBCA portfolio obtains average financial returns of 20.08% with an average volatility of 23.09%. With the inclusion of Bitcoin as part of the portfolio, the average volatility of the portfolio throughout all the quarters increased by 5.1% to 24.27%. Despite the slight increase in portfolio risks, the average returns of the portfolio saw a significant jump to 34.67%, which resulted in a higher average Sharpe ratio. As Bitcoin is a highly volatile asset, an increase in portfolio risks is expected. However, the results indicate that the additional returns an investor can gain significantly outweigh the additional risks.
Looking at the asset allocation comparison between the portfolios in Table 6 and Fig. 8, our EBCA portfolio has given Bitcoin the most weight in the portfolio. At the same time, it controls its weight within a reasonable amount so that the risks can be diversified to the traditional commodities. Our findings indicate that much higher returns can be achieved without being exposed to significant financial risks by including Bitcoin in the commodity portfolio.

Conclusion
Bitcoin has attracted significant attention from investors and policymakers in the global commodity market. Taking advantage of this asset due to its potential benefits and incorporating it as a part of the broad commodity trading portfolio will prove to be of great importance to investors and policymakers. In this paper, we propose a NEPO framework utilized for broad commodity assets, which integrates a deep learning-based model for future returns forecast and a reinforcement learning-based model for optimizing the asset weight allocation.
In terms of forecasting future prices and returns of the broad commodity assets, the empirical results suggest that our proposed VMD-BiLSTM prediction model can effectively improve the prediction accuracy and the trend prediction ability consistently across various commodity assets, including stocks, agriculture, energy, precious metal, and cryptocurrency commodities, across different sectors. In terms of portfolio performances, the broad commodity portfolio constructed using our reinforcement learning-based optimizer achieves significantly higher returns and a better Sharpe ratio than other commodity funds, indices, and asset allocation strategies. In addition, by incorporating Bitcoin into the asset pool, our portfolio optimization framework can increase the financial performance of the broad commodity portfolio by taking advantage of its high returns and effectively reducing its inherent risks.
This study adds to the literature through multiple channels. First, our broad commodity portfolio optimization framework serves as an early attempt to incorporate Bitcoin in the asset pool. Further, it could be effectively used to increase the diversification premiums of the portfolio without greater exposure to investment risks. Our VMD-BiLSTM forecasting approach differs from other hybrid forecasting approaches applicable in financial time series analysis. It directly generates the forecasting results by simultaneously using all the extracted intrinsic modes as prediction model inputs. Our proposed model can effectively avoid the estimation errors that tend to accumulate in the current ensemble prediction approaches by eliminating the aggregation step. Finally, our proposed NEPO framework contributes to the artificial intelligencebased portfolio optimization literature by broadening the optimizer's weight allocation decisions from discrete to continuous action-space and considering the asset forecasting errors in the weight allocation process. Thus, it improves the practicality as well as consistency with reality. By proposing a NEPO optimization framework, our study supports a promising trend in improving the portfolio allocation decision-making for broad commodity assets.
Although the results are promising, our study also faces certain limitations. For instance, our framework only uses structured data (asset prices) as input. Future studies can incorporate unstructured data such as news reports and social media sentiments to further improve the predictive ability of the framework. Moreover, for simplicity, we do not consider associated costs such as inflation and other management costs. Considering and calculating these associated assets in future studies can further improve the model's practicality.