Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis

Wang, Jianzhou; Wang, Shuai; Lv, Mengzheng; Jiang, He

doi:10.1186/s40854-023-00564-5

Research
Open access
Published: 07 January 2024

Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis

Jianzhou Wang¹,
Shuai Wang ORCID: orcid.org/0000-0001-8985-8262¹,
Mengzheng Lv¹ &
…
He Jiang²

Financial Innovation volume 10, Article number: 36 (2024) Cite this article

1583 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

Value at risk (VaR) and expected shortfall (ES) have emerged as standard measures for detecting the market risk of financial assets and play essential roles in investment decisions, external regulations, and risk capital allocation. However, existing VaR estimation approaches fail to accurately reflect downside risks, and the ES estimation technique is quite limited owing to its challenging implementation. This causes financial institutions to overestimate or underestimate investment risk and finally leads to the inefficient allocation of financial resources. The main purpose of this study is to use machine learning to improve the accuracy of VaR estimation and provide an effective tool for ES estimation. Specifically, this study proposes a VaR estimator by combining quantile regression with “Mogrifier” recurrent neural networks to capture the “long memory” and “clustering” properties of financial assets; while for estimating ES, this study directly models the quantile of assets and employs generative adversarial networks to generate future tail risk scenarios. In addition to the typical properties of financial assets, the model design is also consistent with heterogeneous market theory. An empirical application to four major global stock indices shows that our model is superior to other existing models.

Introduction

In the ongoing credit and financial crises, it is essential to manage risk using appropriate measurement tools. Value at risk (VaR) is a widely used risk measure in financial institutions owing to its easy calculation and clear definition. However, the drawbacks of VaR (Artzner et al. 1999; Kwon 2021) are apparent: (1) VaR does not measure the left-tail risk beyond the quantile at the desired level and (2) VaR is not a consistent risk measure and does not satisfy some desirable properties such as subadditivity and convexity. These drawbacks cause investors and risk managers to overestimate or underestimate risk. In the recent Basel Accords, expected shortfall (ES) replaced VaR as the standard measure of market risk, making it the most popular risk measure for financial institutions and investors (Acerbi and Tasche 2002). ES, in addition to having many other desirable properties, is a consistent risk measure defined as the conditional mean of the loss over VaR at a given confidence level (Rockafellar and Uryasev 2000). Although the ES employed in Basel III and IV can provide more information about the left tail of assets, estimation is inherently challenging, as ES is not elicitable, which means there exists no scoring function for which the expectation is minimized by the true ES (Gneiting 2011).

Classical approaches for forecasting VaR can be divided into three major categories: nonparametric, parametric, and semi-parametric. The nonparametric approach does not require assumptions regarding the distribution of returns. Historical simulation is the primary representative of this category, where the empirical distribution of historical returns is used to calculate VaR. Although the calculation complexity of this approach is relatively low, it cannot capture fluctuations that do not exist in the historical window used (Chang et al. 2003). To estimate VaR using a parametric approach, it is necessary to define an effective model of return distribution. Some well-known methods in this category are the variance–covariance model and many GARCH-type models. However, the distribution assumptions of this method (such as Gaussian and Student’s t distributions) are not applicable to most financial time-series data. Semi-parametric approaches to VaR forecasting include those that use extreme value theory (Ener et al. 2012) and those that directly model the conditional quantile for a chosen probability level using quantile regression (QR), such as conditional autoregressive VaR (CAViaR) modeling (Engle and Manganelli 2004). In empirical studies on VaR forecast accuracy, the CAViaR models have performed well (Ener et al. 2012). Given the performance of QR in VaR estimation, scholars have combined it with machine learning methods, such as quantile regression neural networks (QRNN) and LASSO-QR, to further improve forecast accuracy. Moreover, there are inherently suitable methods for forecasting financial time series using deep learning such as recurrent neural networks (RNNs) and their variants. In particular, the MogLSTM proposed by Melis et al. (2019) has received widespread attention owing to the interaction operation between the hidden layer state and the current input first. We argue that through this interaction operation, the stored historical return information can be sufficiently combined with the currently available information to mimic long memory and nonlinear dependencies as well as the volatility clustering of returns. In addition, the forget gate in the MogLSTM cell can discard historical information that does not contribute to risk estimation, thereby avoiding serious clustered VaR violations. Therefore, we combined QR with a Mogrifier long short-term memory (MogLSTM) and Mogrifier gated recurrent unit (MogGRU) (Qin 2020) to propose two new deep learning QR (deep-QR) models: QRMogLSTM and QRMogGRU. For ease of expression, we call MogLSTM and MogGRU the Mogrifier recurrent neural networks (MogRNNs).

ES forecasts can be produced as byproducts of many VaR forecasting methods. Historical simulation and kernel density estimation in the nonparametric category can estimate ES by generating density forecasting. This is also the case for parametric approaches that involve a model for conditional variance, such as the GARCH model, and a distributional assumption. ES has always been regarded as not elicitable, whereas Fissler et al. (2015) show that VaR and ES are jointly elicitable. Therefore, another recently popular method is to jointly estimate VaR and ES based on the AL scoring function (Taylor 2019). Although machine learning methods can achieve accurate VaR estimations by combining QR, they provide no apparent means of producing ES forecasts. There is no scoring function for which the expectation is minimized by the true ES (Gneiting 2011) to construct supervised learning directly. In recent years, many data-driven methods based on Generative Adversarial Networks (GANs) (Goodfellow et al. 2014) have been utilized to generate renewable resource scenarios, particularly for wind power (see, Ma et al. 2013; Liang and Tang 2020; Yuan et al. 2021). Although the potential benefits of the physical process from wind energy to wind power generation are obvious, one of the main obstacles to its implementation is the uncertainty of predicting meteorological variables. Therefore, researchers have developed data-driven methods for GANs that can generate scenarios based on historical data related to meteorological variables, and achieve satisfactory results for uncertainty prediction. In the field of risk management, a few studies have applied GANs to capture the uncertainty of financial asset returns and prices and obtain VaR estimations (see, Zhu et al. 2021; Fatouros et al. 2022). Although these generative models accurately portray variable uncertainty without the need to fit probabilistic models of stochastic variables, they do not provide an efficient way to estimate ES. Inspired by the above generative model-based research, this study employs GANs to specifically model uncertainty scenarios in the left tail of asset returns, and accordingly proposes an ES estimation method based on scenario generation. Specifically, after generating tail risk scenarios, ES can be estimated by calculating the arithmetic average of the risk scenarios below the VaR. This ES estimation method based on GANs risk scenario generation, which is interesting and promising, has never been explored.

In addition to the aforementioned VaR and ES modeling methods, the hypothesis of a heterogeneous market (Müller et al. 1993) potentially contributes to the estimation of risk measures. Different participants in a heterogeneous market have different time horizons and dealing frequencies, and can have different degrees of risk aversion, institutional constraints, and transaction costs. The risk measures in a heterogeneous market can be affected by participants with different dealing frequencies. In the VaR and ES estimation processes, we consider the heterogeneous market hypothesis as a guiding theory in the design of our estimation framework.

According to the pertinent literature, most existing VaR and ES methods face various challenges. First, traditional methods and simple machine learning models find it difficult to capture long memory and nonlinear dependencies in financial time series, which leads to the low accuracy of these models in the estimation of VaR and ES. The second challenge is severe VaR violations during which the asset realizes a loss exceeding the VaR value owing to dependencies between VaR forecasts, especially for the 99% confidence level. This is often the case when sharp market plunges occur, especially for developed markets (Žiković and Filer 2013). Third, although machine learning methods can achieve accurate VaR estimations by combining QR, they provide no apparent means of producing ES forecasts. Finally, financial market risk can be affected by participants with different time horizons and dealing frequencies, and existing risk-forecasting methods often ignore the importance of these factors. These challenges of VaR and ES methods reflect additional motivations for the framework proposed in this study.

In summary, this study introduces a data-driven framework that forecasts the VaR and ES of assets, addressing the aforementioned challenges with the following key innovations:

(1)
Two VaR estimation methods based on deep learning: QRMogLSTM and QRMogGRU. These two methods not only mimic long memory and nonlinear dependencies to capture rare market events, but also discard unimportant historical information to avoid clustered VaR violations.
(2)
An ES estimation model based on GANs to solve the problem in which QR-based machine learning methods have difficulty generating ES estimates.
(3)
The use of the decomposition-aggregation mechanism to implement risk measure forecasting to capture the behavior of market participants with multiple time scales. Additionally, this study considers the heterogeneous market hypothesis as theoretical guidance in the design of the estimation model.

Related work

In finance, there is a growing interest in QR, with a particular focus on VaR models. There are two reasons for this: (1) QR provides a complete characterization of the random relationship between variables and (2) QR offers a more robust and thus more effective estimation in some non-Gaussian settings. The majority of the QR literature has focused on statistical models that generate future VaR in fixed model forms. For example, Koenker and Bassett (1978) propose a linear estimation process for conditional quantiles, extending the ordinary regression model by setting the loss function to quantile loss. Engle and Manganelli (2004) propose another quantile estimation method based on the linear QR technique to model the quantile directly, namely CAViaR. However, the return data of the real stock market are usually nonlinear and do not satisfy a normal distribution, and it is difficult to find suitable functional forms for the linear QR and CAViaR models (Huang 2013). To solve these estimation problems, Taylor (2000) applies the QRNN proposed by White (1992) to estimate conditional quantiles. The empirical results prove that the QRNN outperforms traditional GARCH-class models in terms of forecasting performance. The motivation for using QRNN to estimate VaR is clear: to find a suitable nonlinear functional form for the QR process with the powerful nonlinear mapping capability of artificial neural networks. Unfortunately, a QRNN is essentially a feedback-type neural network, and thus suffers from overfitting, underfitting, and a tendency to fall into local optimal points (Qiu and Song 2016). Inspired by Taylor, the following research idea is clear to scholars: extend the better-performing point forecasting model to the field of quantile estimation. For example, Takeuchi et al. (2006) and Li et al. (2007) proposed QR-SVM, and Xu et al. (2016) applied this method to VaR estimation and concluded that QR-SVM outperformed traditional GARCH-like and linear QR models. Nguyen et al. (2020) also propose the use of LASSO-QR to study tail risk in the cryptocurrency market.

With the development of deep learning, scholars have again turned their attention to deep neural networks, especially RNNs (Wang et al. 2022b) and their variants. For example, Wang et al. (2020) estimated quantiles using a QR long short-term memory network (LSTM) and constructed forecasting intervals from the upper and lower quantile estimation results. However, few studies have focused on deep-QR to estimate the VaR. This may be because the financial sequence is more volatile, resulting in complex neural networks that are difficult to train and need a large number of hyperparameters to be set appropriately.

ES is a standard risk measure in the recent Basel Accord, while its related modeling work is less. Mainly because ES is not an elicitable measure, constructing a loss function for its estimation process is challenging (Cai and Wang 2008; Grabchak and Christou 2021). Scholars such as Du and Escanciano (2017) and Patton et al. (2019) have explored ES estimation approaches. Recently, a popular method has been proposed to jointly estimate VaR and ES based on the research theories of Fissler et al. (2015). For example, Meng and Taylor (2020) and Merlo et al. (2021) jointly estimated VaR and ES using CAViaR-type models based on the AL scoring function. In addition, the current literature on risk measurement modeling using generative models focuses only on VaR estimation without providing ideas for ES estimation. For example, Zhu et al. (2021) and Fatouros et al. (2022) employed GANs to generate price/return scenarios for future assets and obtained VaR by calculating the quantiles of all the scenarios. These methods are the same as scenario generation in the field of renewable energy (Li et al. 2020) and generally follow two steps: (1) obtaining the probability distribution of the sequence itself or the forecasting error, and (2) sampling the scenarios from the statistical distribution.

In a broad strand of the financial literature, the hypothesis of a heterogeneous market mainly guides the forecasting of asset volatility, such as the heterogeneous autoregressive (HAR) model (Corsi 2009). Decomposition-aggregation forecasting is a method that captures multi-timescale behaviors for the prediction of financial asset returns. The application scenarios of decomposition-aggregation forecasting in finance include, but are not limited to futures (Jiang et al. 2021; Guo et al. 2022), stock (Deng et al. 2022; Wang et al. 2022a), and cryptocurrency (Parvini et al. 2022) markets. The research results in the above studies show that forecasts based on decomposition-aggregation learning can improve accuracy. Moreover, the main idea of decomposition-aggregation forecasting is compatible with the heterogeneous market hypothesis.

Based on the above, the main contribution of the current study is fourfold: First, we propose a novel probabilistic framework for VaR estimation based on QRMogLSTM and QRMogGRU. This framework performs better in terms of both VaR 95% and VaR 99% than prevalent VaR estimation methods. Second, we developed an ES estimation approach based on GANs, providing future risk scenarios based on the generation technology of tail distribution, and solving the problem of machine learning methods having difficulties in producing ES forecasts. Various evaluation indices and statistical tests prove the validity of the ES forecasting approach. Third, we consider investors' different time horizons and dealing frequencies based on the heterogeneous market hypothesis. It captures the investment behaviors of short-, medium-, and long-term investors and does not significantly increase the computational burden. Finally, we explore a relatively complete QR model space using CAViaR-type models, QRNN, LASSO-QR, QR-SVM, QR-tree models, and QR-deep learning as benchmark models. The backtesting results of all the models were compared against four main stock indices, exploring questions that contribute to our understanding of the accurate estimation of risk measures.

VaR and ES estimation framework

In this study, we extend the decomposition-aggregation strategy, state-of-the-art deep learning methods, and Bayesian optimization technology to the field of risk measures estimation. These technologies are compatible with the properties of financial assets. The corresponding mathematical principles and financial theories are presented below.

The hypothesis of a heterogeneous market

Müller et al. (1993) proposed the hypothesis of a heterogeneous market, which states that different participants in a heterogeneous market have different time horizons and dealing frequencies. Therefore, the time horizon of participants has a "fractal" structure, which consists of short-, medium- and long-term components. Each component has its own reaction time to events or news, and different degrees of risk aversion, institutional constraints, and transaction costs. We argue that the risk measure of financial assets is likewise affected by the unique dealing frequency of heterogeneous participants. Therefore, we incorporate the heterogeneous market hypothesis into the design of our estimation framework. We use the decomposition-aggregation strategy ("Decomposition and aggregation based on investor heterogeneity" section) to distinguish participants with different time horizons and dealing frequencies and employ MogRNNs ("Quantile regression Mogrifier RNNs (QRMogRNNs)" section) to generate risk estimation specifically.

Decomposition and aggregation based on investor heterogeneity

Forecasting technology based on decomposition-aggregation learning has been applied in many research fields, such as finance, energy (Wang et al. 2022c; Neshat et al. 2022), and the environment (Kim et al. 2022; Wang et al. 2022d). Research findings in these fields show that forecasts based on decomposition-aggregation learning can improve accuracy. In risk measure estimation, we argue that this mechanism can identify and extract asset sequences at different frequencies (P H and Rishad 2020) for analysis and forecasts. This mechanism is compatible with the heterogeneous market hypothesis that short-term investors affect signals with higher frequencies, whereas the trading behaviors of medium- and long-term investors affect those with lower frequencies. Therefore, this study extends decomposition-aggregation learning to the field of risk measure estimation. Specifically, we propose a real-time decomposition-aggregation approach that uses the available information to learn decomposition and aggregation rules. Whenever new data are known, they are added to the available information set and then decomposed and aggregated according to the rules.

Variational mode decomposition

The decomposition method used in this study is the variational mode decomposition (VMD). VMD (Dragomiretskiy and Zosso 2014) is a powerful signal decomposition algorithm that decomposes a complex signal into several intrinsic mode functions (IMFs) with a specific center frequency and bandwidth that are completely nonrecursive. In the decomposition mode of the VMD, we can redefine the IMF as

$$u_{k} \left( t \right) = A_{K} \left( t \right){\text{cos}}\left( {\varphi_{k} \left( t \right)} \right),$$

(1)

where $A_{K} \left( t \right)$ and $\varphi_{k} \left( t \right)$ indicate the instantaneous amplitude and phase of $u_{k} \left( t \right)$, respectively; $w_{k} \left( t \right)$ is the instantaneous angular frequency. From ${{w_{k} \left( t \right){\text{ = d}}\varphi_{k} \left( t \right)} \mathord{\left/ {\vphantom {{w_{k} \left( t \right){\text{ = d}}\varphi_{k} \left( t \right)} {dt}}} \right. \kern-0pt} {dt}} > 0$ (Dragomiretskiy and Zosso 2014), we know that $\varphi_{k} \left( t \right)$ must be differentiable at least once and $\varphi_{k}^{\prime } \left( t \right) > 0$.

The VMD algorithm is realized by solving the following constrained variable problem:

$$\left\{ {\begin{array}{*{20}c} {\mathop {{\text{argmin}}}\limits_{{\left\{ {u_{k} ,w_{k} } \right\}}} \sum\nolimits_{k = 1}^{K} {\left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) * u_{k} \left( t \right)} \right]{\text{exp}}( - jw_{k} t)} \right\|}_{2}^{2} } \\ {s.t.\sum\nolimits_{k = 1}^{K} {u_{k} \left( t \right)} = v\left( t \right)} \\ \end{array} } \right.,$$

(2)

where $v_{k} \left( t \right)$ is the input signal, $j^{2} = - 1$, and $\delta \left( t \right)$ represents the Dirac distribution function. The Lagrange multiplier is used to transform Eq. (2) into an unconstrained optimization problem:

$$\begin{aligned} & \mathop {{\text{argmin}}}\limits_{{\left\{ {u_{k} ,w_{k} } \right\}}} \alpha \sum\nolimits_{k = 1}^{K} {\left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) * u_{k} \left( t \right)} \right]{\text{exp}}( - jw_{k} t)} \right\|}_{2}^{2} \\ & \quad + \left\| {v\left( t \right) - \sum\nolimits_{k = 1}^{K} {u_{k} \left( t \right)} } \right\|_{2}^{2} + \left\langle {\lambda \left( t \right),v\left( t \right) - \sum\nolimits_{k = 1}^{K} {u_{k} \left( t \right)} } \right\rangle , \\ \end{aligned}$$

(3)

where $\lambda \left( t \right)$ is the Lagrange multiplier used to enhance the constraint and $\alpha > 0$ is the quadratic penalty factor. $\left\| {v\left( t \right) - \sum\nolimits_{k = 1}^{K} {u_{k} \left( t \right)} } \right\|_{2}^{2}$ is a quadratic penalty term that accelerates the convergence rate and ensures minimum squared error.

According to the ADMM optimization method (Bertsekas and DimitriP 1982; Dragomiretskiy and Zosso 2014), the update modes of $u_{k}$ and $w_{k}$ can be expressed by the following equations:

$$\hat{u}_{k}^{n + 1} \left( w \right) = \frac{{\hat{x}\left( w \right) - \sum\nolimits_{i < k}^{{}} {\hat{u}_{i}^{n} \left( w \right) + {{\hat{\lambda }\left( w \right)} \mathord{\left/ {\vphantom {{\hat{\lambda }\left( w \right)} 2}} \right. \kern-0pt} 2}} }}{{1 + 2\alpha \left( {w - w_{k} } \right)^{2} }},$$

(4)

$$\hat{w}_{k}^{n + 1} = \frac{{\int_{0}^{\infty } {w\left| {\hat{u}_{k}^{n + 1} \left( w \right)} \right|^{2} dw} }}{{\int_{0}^{\infty } {\left| {\hat{u}_{k}^{n + 1} \left( w \right)} \right|^{2} dw} }},$$

(5)

where $\hat{u}_{k}^{n + 1} \left( w \right)$, $\hat{x}\left( w \right)$, and $\hat{\lambda }\left( w \right)$ are the Fourier transforms of the signals $u_{k}^{n + 1} \left( t \right)$, $x\left( t \right)$, and $\lambda \left( w \right)$, respectively. Moreover, the stop condition of VMD is expressed based on the tolerance of the convergence criterion:

$$\hat{u}_{k}^{n + 1} \left( w \right) = \sum\limits_{k} {\frac{{\left\| {\hat{u}_{k}^{n + 1} \left( w \right) - \hat{u}_{k}^{n} \left( w \right)} \right\|_{2}^{2} }}{{\left\| {\hat{u}_{k}^{n} \left( w \right)} \right\|_{2}^{2} }}} < \varepsilon .$$

(6)

This study chooses the VMD algorithm instead of the empirical mode decomposition (EMD)-class (Huang et al. 1996) algorithm, because VMD can fix the number of generated IMFs, thus avoiding the inconsistency between in-sample and out-sample decomposition results.

IMFs aggregation based on fuzzy entropy

To balance forecasting accuracy and time consumption, we do not model each IMF, but the subseries aggregated by all the IMFs. However, there is no clear division between different frequencies of financial assets (IMFs), and the boundary is fuzzy. Thus, we employ the fuzzy entropy and approximation criterion (Fu et al. 2020) to aggregate all the IMFs. The approximation criterion is defined in Eqs. (7) and (8):

$$criterion = \frac{{\max \left( {FE_{i} |i = 1,...,l} \right) - \min \left( {FE_{i} |i = 1,...,l} \right)}}{{{l \mathord{\left/ {\vphantom {l 2}} \right. \kern-0pt} 2}}},$$

(7)

$$\Delta FE_{ij} = \left| {FE_{i} - FE_{j} } \right| < criterion,$$

(8)

where $l$ denotes the number of decomposed IMFs and $\left\{ {FE_{i} |i = 1,...,l} \right\}$ represents the fuzzy entropy value corresponding to the ith IMF.

Real-time decomposition and aggregation mechanism

Let $O_{t}$ be the financial asset data, and $M_{t}$ be the available information set at time t. The real-time decomposition-aggregation mechanism can be implemented as follows:

(1)
Decompose $O_{1} ,O_{2} , \ldots ,O_{t - 1}$ at t-1, into k IMFs using VMD and information of $M_{t - 1}$, such that the jth IMF can be denoted as $\left( {{\varvec{IMF}}_{j} |M_{t - 1} } \right)$.
(2)
Calculate the fuzzy entropy of all decomposed IMFs, then acquire the aggregation rule $\left( {H|M_{t - 1} } \right)$, and obtain p subseries $\overrightarrow {{F_{1} }} ,\overrightarrow {{F_{2} }} , \ldots ,\overrightarrow {{F_{p} }} ,\left( {p \le k} \right)$ according to the approximate criterion.
(3)
Calculate the jth IMF, that is, $\left( {IMF_{t,j} |M_{t} } \right)$ when new information is available at t using the VMD and information from $M_{t}$.
(4)
Calculate the fuzzy entropy $FE_{t} \left( {IMF_{t,j} |M_{t} } \right)$ of each IMF at t and apply $\left( {H|M_{t - 1} } \right)$ to $FE_{t} \left( {IMF_{t,j} |M_{t} } \right)$ to obtain the aggregated subseries $F_{t,1} ,F_{t,2} , \ldots ,F_{t,p} ,\left( {p \le k} \right)$ at t.

Figure 1 shows the decomposition and aggregation results of the SPX500 index for the training set. As shown in Fig. 1, the frequency of subseries (Sub) 1 is low; some researchers have argued that this sequence is a trend sequence (Zhang et al. 2008). The frequency of Sub 2 is relatively high, which is the medium- and long-term impact brought to the market by particular events or the behavior of medium- and long-term investors. Subs 3 and 4 are high-frequency sequences influenced by short-term shocks (such as monetary policy and the release of US macroeconomic data) or short-term investor behavior.

In the subsequent estimation, we only needed to model the subseries $F_{t,1} ,F_{t,2} , \ldots ,F_{t,p}$, which are aggregated in real time from IMFs and summed up the estimation results to obtain the estimated values of the financial asset data.

Quantile regression Mogrifier RNNs (QRMogRNNs)

Forecasting VaR at significance level $\alpha$ at a future time is equivalent to estimating the $\tau$th quantile. We called Mogrifier LSTM (MogLSTM) and Mogrifier GRU (MogGRU) as MogRNNs collectively. QRMogRNNs are realized by designing the loss functions of MogRNNs as quantile loss, which is consistent with Taylor's study (2000) of QRNN.

Mogrifier LSTM and Mogrifier GRU

We found that the Mogrifier (Mog) structure proved to have better performance owing to an interesting model improvement, that is, performing regular interactive operations on the current input and the previous hidden state. The advantages of adopting MogRNNs to estimate the risk measures of financial assets are reflected in the properties of financial assets and the heterogeneous market hypothesis.

Properties of financial asset

The interaction between the previous hidden layer which stored the historical return information and the current input enables the model to learn the "clustering" of financial asset volatility. By inputting the interaction results into the LSTM or GRU structure, the model retains or discards the historical information to learn the long-term dependencies of the financial sequence.

Heterogeneous market hypothesis

The MogRNNs model architecture coincides well with the heterogeneous market hypothesis, as reflected in two ways. First, the RNNs choose to retain or discard historical information. Information that does not contribute to risk estimation is discarded (such as the impact of short-term speculation on the long-run market), whereas information that contributes to risk estimation (such as medium- and long-term investment behavior) is retained in the hidden layer of the RNNs. Second, the Mog structure can drive retained long-term investment behavior and short-term investment behavior performs interactive operations to simulate the interaction between short-term investors and medium- or long-term investors.

In addition to the abovementioned advantages, the forget gate in the LSTM cell can discard historical information that does not contribute to the estimation, thereby avoiding serious clustered VaR violations. Therefore, we combined QR with MogLSTM and MogGRU to propose two new deep-QR models to model the quantiles of financial assets, namely, QRMogLSTM and QRMogGR.

As LSTM and GRU have similar cell structures, we introduce the structure of MogLSTM to explain the model improvement. A schematic of MogLSTM is presented in Fig. 2. Its mathematical principle is as follows:

To demonstrate the modifications of MogLSTM, the cell structure of LSTM is first presented as follows:

$$\begin{aligned} {\mathbf{F}} & = \sigma \left( {{\mathbf{W}}^{Fx} x + {\mathbf{W}}^{Fh} {\mathbf{h}}_{prev} + {\mathbf{b}}^{F} } \right), \\ {\mathbf{I}} & = \sigma \left( {{\mathbf{W}}^{Ix} x + {\mathbf{W}}^{Ih} {\mathbf{h}}_{prev} + {\mathbf{b}}^{I} } \right), \\ {\mathbf{J}} & = \tanh \left( {{\mathbf{W}}^{Jx} x + {\mathbf{W}}^{Jh} {\mathbf{h}}_{prev} + {\mathbf{b}}^{J} } \right), \\ {\mathbf{O}} & = \sigma \left( {{\mathbf{W}}^{Ox} x + {\mathbf{W}}^{Oh} {\mathbf{h}}_{prev} + {\mathbf{b}}^{O} } \right), \\ {\mathbf{C}} & = {\mathbf{F}} \odot {\mathbf{C}}_{prev} + {\mathbf{I}} \odot {\mathbf{J}}, \\ {\mathbf{h}} & = {\mathbf{O}} \odot \tanh \left( {\mathbf{C}} \right), \\ \end{aligned}$$

(9)

where $\sigma \left( \cdot \right)$ is the sigmoid function; ${\mathbf{F}}$, ${\mathbf{I}}$, ${\mathbf{J}}$, and ${\mathbf{O}}$ are gates of the LSTM cell; they decide which information will be retained or discarded. The current input to the LSTM $x$ must be related to the previous hidden state ${\mathbf{h}}_{prev}$. Thus, MogLSTM performs iterative interaction operations on ${\mathbf{x}}$ and ${\mathbf{h}}_{prev}$ in advance to obtain the modulated inputs ${\mathbf{x}}^{ \uparrow }$ and ${\mathbf{h}}_{prev}^{ \uparrow }$. These inputs,${\mathbf{x}}^{ \uparrow }$ and ${\mathbf{h}}_{prev}^{ \uparrow }$, can be defined as ${\mathbf{x}}^{i}$ and ${\mathbf{h}}_{prev}^{i}$ respectively, and expressed as follows:

$$\left\{ {\begin{array}{*{20}l} {{\mathbf{x}}^{i} = 2\sigma \left( {{\mathbf{Q}}^{i} {\mathbf{h}}_{prev}^{i - 1} } \right) \odot {\mathbf{x}}^{i - 2} ,} \hfill & {for\;{\text{odd}}i \in \left[ {1,2, \ldots ,r} \right]} \hfill \\ {{\mathbf{h}}_{prev}^{i} = 2\sigma \left( {{\mathbf{R}}^{i} {\mathbf{x}}^{i - 1} } \right) \odot {\mathbf{h}}_{prev}^{i - 2} ,} \hfill & {for\;{\text{even}}i \in \left[ {1,2, \ldots ,r} \right]} \hfill \\ \end{array} } \right.,$$

(10)

with ${\mathbf{x}}^{ - 1} = {\mathbf{x}}$ and ${\mathbf{h}}_{prev}^{0} = {\mathbf{h}}_{prev}$. The iterative round (also known as Mog step), $r \in {\mathbb{N}}$, is a hyperparameter; $r = 0$ recovers standard LSTM. ${\mathbf{Q}}^{i}$ and ${\mathbf{R}}^{i}$ are matrices resulting from random initialization To reduce the number of additional model parameters, the ${\mathbf{Q}}^{i}$ and ${\mathbf{R}}^{i}$ matrices are decomposed into the products of the low-rank matrices: ${\mathbf{Q}}^{i} = {\mathbf{Q}}_{left}^{i} {\mathbf{Q}}_{right}^{i}$ with ${\mathbf{Q}}^{i} \in {\mathbb{R}}^{m \times n}$, ${\mathbf{Q}}_{left}^{i} \in {\mathbb{R}}^{m \times k}$, ${\mathbf{Q}}_{right}^{i} \in {\mathbb{R}}^{k \times n}$, where $k < \min \left( {m,n} \right)$ is the rank.

MogGRU can be modified by the same mechanism, that is, the hidden state ${\mathbf{h}}_{prev}$ and current input ${\mathbf{x}}$ are operated interactively before they are inputted into the GRU cells.

QRMogRNNs

We assume that $f\left( {{\mathbf{X}}_{i} ,{{\varvec{\uptheta}}}} \right)$ indicates the forecasting model and ${{\varvec{\uptheta}}}$ represents all the parameters of this model. Next, we design the loss function in Eq. (11), where the parameter ${{\varvec{\uptheta}}}\left( \tau \right)$ is estimated corresponding to $\tau$, so we can estimate VaR using the model $f\left( {{\mathbf{X}}_{i} ,{\hat{\mathbf{\theta }}}\left( \tau \right)} \right)$ based on QR theory:

$$QL = \sum\limits_{i = 1}^{N} {\max \left\{ {\tau \left( {f\left( {{\mathbf{X}}_{i} ,{{\varvec{\uptheta}}}\left( \tau \right)} \right) - y_{i}^{act} } \right),\left( {\tau - 1} \right)\left( {f\left( {{\mathbf{X}}_{i} ,{{\varvec{\uptheta}}}\left( \tau \right)} \right) - y_{i}^{act} } \right)} \right\}} .$$

(11)

The condition quantile of $y_{i}^{act}$ obtained by QRMogRNNs (i.e., QRMogGRU and QRMogLSTM) can be formulated based on the aforementioned forward propagation of MogLSTM and MogGRU as follows:

$$Q\left( {\tau |X_{i} } \right) = f\left( {X_{i} ,\hat{\theta }\left( \tau \right)} \right) = \sigma \left( {{\mathbf{W}}^{Ox} \left( \tau \right)x + {\mathbf{W}}^{Oh} \left( \tau \right){\mathbf{h}}_{prev} \left( \tau \right) + {\mathbf{b}}^{O} \left( \tau \right)} \right),$$

(12)

where ${\mathbf{W}}\left( \tau \right)$ and ${\mathbf{b}}\left( \tau \right)$ indicate the weight matrix and bias corresponding to the quantile $\tau$, respectively; ${\mathbf{h}}_{prev} \left( \tau \right)$ represents the hidden state with respect to $\tau$.

Bayesian hyperparameter optimization and model selection

The importance of using Bayesian theory and Bayesian formulas to realize model selection and hyperparameter optimization is reflected in two ways: First, the deep learning model has many hyperparameters that affect the accuracy of the model, especially for MogLSTM and MogGRU structures, which both have an extra parameter, r. Second, according to the "No free lunch" theorem, while model A outperforms model B on one specific task, there must be another specific task on which A performs worse than model B. Therefore, we regard the type of neural network layer as a hyperparameter that has two choices: QRMogLSTM and QRMogGRU. We coded them as 0 and 1, respectively, to realize the model selection. Table 1 presents the hyperparameters to be optimized and their value ranges, and the Bayesian optimization (BO) can be implemented as follows:

Table 1 Value range of hyperparameter to be optimized

Full size table

BO is based on Bayes’ theorem, described as

$$P\left( {\Lambda^{UN} |\Phi } \right) = \frac{{P\left( {\Phi |\Lambda^{UN} } \right)P\left( {\Lambda^{UN} } \right)}}{P\left( \Phi \right)},$$

(13)

where $\Lambda^{UN}$ indicates the unknown information, $P\left( {\Lambda^{UN} } \right)$ refers to the prior distribution, $P\left( {\Phi |\Lambda^{UN} } \right)$ indicates the probability, and $P\left( {\Lambda^{UN} |\Phi } \right)$ denotes the posterior distribution.

The basic steps of BO are given as.

(1)
Build a surrogate probability model of the objective function.
(2)
Find the ideal hyperparameters on the surrogate.
(3)
Apply these hyperparameters to the true objective function to assess them.
(4)
Update the surrogate model incorporating the new results.
(5)
Repeat steps (2–4) until max iterations are evaluated.

After considering the layer type as a hyperparameter and introducing the extra parameter $r$, we have four alternative models to select:

(1)
When $r = 0$ and $LT = 0$, the QRLSTM model is selected.
(2)
When $r = 0$ and $LT = 1$, the QRGRU model is selected.
(3)
When $r > 0$ and $LT = 0$, the QRMogLSTM model is selected, and the hidden layer state and the current input are operated interactively $r$ times.
(4)
When $r > 0$ and $LT = 1$, the QRMogGRU model is selected.

Other parameters of our model are set by default: learning rate = 0.01, optimizer = "Adam" and dropout ratio = 0.2.

Estimate ES based on scenario generation

Although ES has replaced VaR as the standard risk measure in the recent Basel Accord, there are few ES estimation methods because they cannot construct a loss function. This study employs the generative model GANs to estimate ES. Inspired by the direct quantile modeling of the well-known CAViaR model proposed by Engle and Manganelli (2004), we employed GANs as a powerful tool to generate asset-tail scenarios. The principle is that the estimated quantile values are inputted into the GANs to generate many future risk scenarios at the tail of the assets, then the ES can be obtained by averaging scenarios lower than the estimated VaR value.

Least squares generative adversarial network

Least squares generative adversarial network (LSGAN) is a variant of the GANs proposed by Mao et al. (2017). This model maintains the core structure of the GAN, which consists of a generator G and discriminator D. The goal of G is to generate new data that D cannot determine as true or fake by learning the underlying distribution of the training-set data. D is a binary classifier that determines whether the data come from the training set or G by classifying the input data as true or false. During the iterative process, G and D alternately update their weights to improve their discriminative or generative power and generate an adversarial model to reach the Nash equilibrium when D cannot discriminate between true and false.

The input to the generator is a latent variable Z, which can be a simple random distribution such as a Gaussian distribution. The goal of G is to map Z to $G\left( Z \right)$ such that it approximates the true distribution as closely as possible. The inputs to D are the fake data generated by G and the training set $\overline{{\overline{{{\varvec{\Lambda}}}} }}_{1}^{train}$. Because neural networks can fit arbitrary functions, generators and discriminators are generally designed as multilayer neural networks. This is achieved by designing the loss functions $L^{\left( D \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ and $L^{\left( G \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ for D and G. The main improvement in LSGAN is the modification of the loss functions of G and D to the form of a squared error. In LSGAN, the mathematical expressions for $L^{\left( D \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ and $L^{\left( G \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ are as follows:

$$L^{\left( D \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right) = \frac{1}{2}{\mathbb{E}}_{{\overline{{\overline{\Lambda }}}_{1}^{train} \sim p_{data} }} \left[ {\left( {D\left( {\overline{{\overline{\Lambda }}}_{1}^{train} } \right) - \beta } \right)^{2} } \right] + \frac{1}{2}{\mathbb{E}}_{Z} \left[ {\left( {D\left( {G\left( Z \right)} \right) - \alpha } \right)^{2} } \right],$$

(14)

$$L^{\left( G \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right) = \frac{1}{2}{\mathbb{E}}_{Z} \left[ {\left( {D\left( {G\left( Z \right)} \right) - \gamma } \right)^{2} } \right],$$

(15)

where $\alpha$, $\beta$, and $\gamma$ are predefined parameters. Minimizing the objective function minimizes the $Pearson\chi^{2}$ scatter when $\beta - \gamma = 1$ and $\beta - \alpha = 2$ are satisfied such that $\alpha = - 1$, $\beta = 1$, and $\gamma = 0$. For D, a smaller $L^{\left( D \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ indicates that it is more capable of distinguishing true from fake. For G, a smaller $L^{\left( G \right)} \left( {{{\varvec{\uptheta}}}^{\left( D \right)} ,{{\varvec{\uptheta}}}^{\left( G \right)} } \right)$ indicates that it is more capable of falsifying.

Training GANs is difficult in practice because of the instability of GANs learning. The discriminator of the original GAN uses a sigmoid cross-entropy loss function, which may lead to the problem of gradient vanishing during the learning process. The LSGAN uses the least square loss function (LSF) as its discriminator. The idea is simple and efficient: the LSF can move fake samples to the decision boundary because it punishes samples that lie on the correct side of the decision boundary for a long time. Based on this characteristic, the LSGAN is comparable to the Wasserstein generative adversarial network (WGAN) in generating samples that are closer to real data; however, the training process and convergence speed of the LSGAN are faster than those of the complex WGAN.

In addition, we considered a two-time-scale update rule training strategy (Heusel et al. 2017) to optimize the learning process of the LSGAN. Specifically, this study employs the "Adam" optimizer with different learning rates for the discriminator network D and the generator G. G uses the low-speed update rule with the learning rate set at 0.0001, and D uses the relatively fast update rule with the learning rate set at 0.0005.

Although the LSGAN ensures that the punishment for the outlier sample is greater, which solves the problem of unstable (insufficient) GANs training, the following limitations remain. Excessive punishment of outliers by LSGAN may lead to a decrease in the "diversity" of sample generation, and the generator may have the problem of gradient vanishing when the discriminator is excellent enough. This study provides ideas for GANs to estimate ES and employs the LSGAN to implement the empirical study. In future research, we will explore other GANs variants with better performance to realize ES estimation.

Generating the future risk scenarios using LSGAN

Training GANs can be difficult and there are many possible setups. Therefore, we referred to the recent literature (Zhu et al. 2021; Fatouros et al. 2022) on modeling VaR using GANs to construct the structure and parameter settings of the LSGAN in this study. We implemented a conditional LSGAN in which the first step was a pretraining process. Generator G first estimates the in-sample VaR values to learn the future risk scenarios according to the critic scores given by discriminator D. The second step involves an extra training and forecasting process using the rolling window mechanism. Generator G takes b periods of the previous VaR values for additional training to generate the tail distribution of the following future f periods. The data are represented by M containing a total of W periods of VaR, split into two parts: the $M_{b}$ containing VaR for the b previous periods, and the $M_{f}$ that contains the VaR for the f following periods. Therefore, the generator generates simulations $\hat{M}_{f}$ using the previous VaR $M_{b}$ and latent vector Z. In practice, the latent vector represents the unknown future events affecting the market index. A schematic of the conditional LSGAN is shown in Fig. 3.

The generator is composed of a conditioning network and a simulator network. The conditioning network takes the historical risk trend $M_{b}$ followed by a data normalization (Norm) process and 1D convolutional layers (Conv layers), where convolution is done over the time dimension. This output is then flattened and followed by a fully connected layer (Dense). The conditioning network then takes the latent input Z and concatenates it with the input of the previous layer. This is followed by a fully connected layer (Dense) before it is reshaped and followed by 1D convolutional layers (Conv layers) to shrink it down to the desired output shape. The output of the generator $\hat{M}_{f}$ is a vector with simulated tail risk for a market index. The discriminator takes a concatenation of real data $M_{b}$ and fake data $\hat{M}_{f}$. This is processed by normalization (Norm) and several 1D convolutional layers (Conv layers), flattened, and finally there is a fully connected output layer (Dense). The discriminator gives a critic score, assigning a larger value if it believes the data comes from the true distribution, and a smaller value if it believes it comes from the distribution learned by the generator.

Forecasting ES based on risk scenarios and estimated VaR

As mentioned above, this study used risk scenarios to calculate the ES. Risk scenarios (i.e., quantile scenarios) sample approximate representations of the tail uncertainty relationship of asset indicators (e.g., log returns) on a future date. Suppose we obtain M_in observations of the in-sample VaR forecast results provided by any method: $\overline{VaR} \left( \tau \right) = \left\{ {VaR_{1} \left( \tau \right),\;VaR_{2} \left( \tau \right), \ldots ,\;VaR_{{M_{in} }} \left( \tau \right)} \right\}$. We directly model the estimated $\overline{VaR} \left( \tau \right)$ to generate future risk scenarios that follow a tail distribution. The ES value at the specified confidence level can then be calculated by averaging the scenarios lower than the corresponding VaR. This study forecasts ES based on rolling window exercises, and the model is retrained for every specific period. Assuming that the rolling window size is b, the model receives additional training for every f period according to the previous b periods estimated VaR values, and the epochs of pretraining and additional training are $E_{1}$ and $E_{2}$.

The detailed steps for estimating ES using the LSGAN scenario generation approach are as follows:

(1)
Input the estimated in-sample VaR values $\overline{VaR} \left( \tau \right) = \left\{ {VaR_{1} \left( \tau \right),\;VaR_{2} \left( \tau \right), \ldots ,\;VaR_{{M_{in} }} \left( \tau \right)} \right\}$ into the LSGAN model for pretraining. The epoch of LSGAN is $E_{1} = 1000$. The generator and discriminator game against each other, and the pretrained LSGAN $LSGAN_{trained}^{{}} = LSGAN\left( {\overline{VaR} \left( \tau \right)} \right)$ can generate future risk scenarios when they reach the Nash equilibrium.
(2)
Let the trained LSGAN generate N risk scenarios: $\widetilde{{S_{G} }} = \left\{ {S_{G}^{1} ,\;S_{G}^{2} , \ldots ,\;S_{G}^{N} } \right\}$$= LSGAN_{trained}^{{}} \left( Z \right)$, where Z is the latent vector. The generated risk scenarios contain the uncertain information at the tail of the asset.
(3)
Forecast the out-of-sample VaR value, ${\text{VaR}}_{QRMogRNNs}^{{}}$, at a given confidence level (95% in this study) for time t from the QRMogRNNs and select scenarios with $\left( {\widetilde{{S_{G} }}|\widetilde{{S_{G} }} < {\text{VaR}}_{QRMogRNNs}^{{}} } \right)$ lower than ${\text{VaR}}_{QRMogRNNs}^{{}}$.
(4)
Average the selected scenarios to obtain the future f periods ES values: $ES = E\left( {\widetilde{{S_{G} }}|\widetilde{{S_{G} }} < {\text{VaR}}_{QRMogRNNs}^{{}} } \right)$.
(5)
Train the LSGAN every f period according to the previous b periods VaR and set the epoch as $E_{2} = 500$.
(6)
Repeat (2–5) steps to estimate all the out-of-sample ES values.

Since ES is not elicitable (Gneiting 2011), we chose to first estimate the tail risk of assets (i.e., VaR) and then model the tail risk scenarios based on the historical VaR values to estimate ES. Another potential modeling method is to use GANs directly to generate return scenarios and then calculate the arithmetic average of return scenarios below the VaR value as the ES value. We also verified this method but found the following problems: if the asset returns are modeled directly, the available tail risk scenarios will be relatively few, especially during crisis periods. Furthermore, the tail scenarios of returns may be higher than the VaR values produced by the VaR estimator, which can lead to inaccurate or even ineffective risk forecast models. The method proposed in this study can be flexibly combined with any VaR estimator to jointly forecast ES without the abovementioned problems.

Evaluation methodology

VaR backtesting is typically based on coverage, which measures the percentage of times returns exceed the estimated VaR at a probability level of $\alpha$ (Emenogu et al. 2020). This study implements five commonly used VaR backtesting tests and one loss function: the unconditional coverage (UC) test (also known as Kupiec’s POF test) (Kupiec 1995), independence (IND) test, conditional coverage (CC) test (Christoffersen 1998), dynamic quantile (DQ) test (Engle and Manganelli 2004), and Lopez’s magnitude loss function (M-Loss) (Lopez 1999).

(1)
$LR_{UC}$
$$LR_{UC} = - 2\ln \left( {\frac{{\alpha^{{N_{1} }} \left( {1 - \alpha } \right)^{{N_{0} }} }}{{\tilde{\Lambda }^{{N_{1} }} \left( {1 - \tilde{\Lambda }} \right)^{{N_{0} }} }}} \right)\sim \chi_{1}^{2} ,$$
(16)
where $N_{1}$ and $N_{0}$ are the number of times the VaR estimate is and is not exceeded, respectively, $\alpha$ indicates the desired level, and $\tilde{\Lambda }$ is the Hit Ratio (HR), which can be calculated using $N_{1} /\left( {N_{1} + N_{0} } \right)$.
(2)
$LR_{IND}$
$$LR_{IND} = - 2\ln \left( {\frac{{\left( {1 - \tilde{\Lambda }_{2} } \right)^{{\left( {n_{00} + n_{10} } \right)}} \tilde{\Lambda }_{2}^{{\left( {n_{01} + n_{11} } \right)}} }}{{\left( {1 - \tilde{\Lambda }_{01} } \right)^{{n_{00} }} \tilde{\Lambda }_{01}^{{n_{01} }} \left( {1 - \tilde{\Lambda }_{11} } \right)^{{n_{10} }} \tilde{\Lambda }_{11}^{{n_{11} }} }}} \right)\sim \chi_{1}^{2} ,$$
(17)
where $n_{jk}$ is the number of $j$ values followed by a k value in the sequence (with $j,k = \left[ {0,0} \right],\;\left[ {0,1} \right]$ or $\left[ {1,1} \right]$, and 0 indicating that the actual return does not exceed the VaR estimate; otherwise, 1), and $\tilde{\Lambda }s$ are defined as
$$\tilde{\Lambda }_{01} = \frac{{n_{01} }}{{n_{00} + n_{01} }},\;\tilde{\Lambda }_{11} = \frac{{n_{11} }}{{n_{10} + n_{11} }}\;and\;\tilde{\Lambda }_{2} = \frac{{n_{01} + n_{11} }}{{n_{00} + n_{01} + n_{10} + n_{11} }}.$$
(18)
(3)
$LR_{CC}$
$$LR_{CC} = - 2\ln \left( {\frac{{\alpha^{{n_{1} }} \left( {1 - \alpha } \right)^{{n_{0} }} }}{{\left( {1 - \tilde{\Lambda }_{01} } \right)^{{n_{00} }} \tilde{\Lambda }_{01}^{{n_{01} }} \left( {1 - \tilde{\Lambda }_{11} } \right)^{{n_{10} }} \tilde{\Lambda }_{11}^{{n_{11} }} }}} \right)\sim \chi_{2}^{2} .$$
(19)
(4)
DQ test

Engle and Manganelli (2004) proposed the DQ test, and the statistical $Hit$ considered in this test is very similar to $\tilde{\Lambda }$ with $Hit = \tilde{\Lambda } - \alpha$. Constructing a regression model of this variable with its lagged term as well as the estimated VaR series and other variables that should be considered, the regression form, considering the 4th order lagged term, is expressed as

$$Hit_{t} = \beta_{0} + \beta_{1} Hit_{t - 1} + \beta_{2} Hit_{t - 2} + \beta_{3} Hit_{t - 3} + \beta_{4} Hit_{t - 4} + \beta_{5} \widehat{{VaR_{t} }} + u_{t} ,$$

(20)

where $u_{t}$ takes the value of $- \alpha$ with probability $1 - \alpha$, and $1 - \alpha$ with probability $\alpha$.

The matrix representation of Eq. (1) is ${\mathbf{Hit}} = {\mathbf{\beta Hit}}_{lag} + u$. The null hypothesis is constructed as in $H_{0} :{{\varvec{\upbeta}}} = 0$. According to the least squares method, ${\hat{\mathbf{\beta }}} = \left( {{\mathbf{Hit}}_{lag}^{\prime } {\mathbf{Hit}}_{lag} } \right)^{ - 1} {\mathbf{Hit}}_{lag}^{\prime } {\mathbf{Hit}} \sim N\left( {0,\alpha \left( {1 - \alpha } \right)\left( {{\mathbf{Hit}}_{lag}^{\prime } {\mathbf{Hit}}_{lag} } \right)^{ - 1} } \right)$. In the case of the null hypothesis $H_{0}$, the DQ test statistic can be expressed as

$$DQ = \frac{{{\hat{\mathbf{\beta }Hit}}_{{{\mathbf{lag}}}}^{\prime } {\mathbf{Hit}}_{{{\mathbf{lag}}}} {\hat{\mathbf{\beta }}}}}{{\alpha \left( {1 - \alpha } \right)}} \sim \chi_{6}^{2} .$$

(21)

The DQ test statistic asymptotically follows a chi-square distribution with six degrees of freedom.

(5) Lopez’s magnitude loss function

The magnitude loss (M-Loss) function not only considers the number of losses but also the magnitude of extreme tail events; thus, the M-Loss function is more in line with economic significance. The M-Loss function can be defined as

$$M - Loss = I\left( {Y_{t} < \hat{Y}_{t}^{\alpha } } \right)\left( {1 + \left( {Y_{t} - \hat{Y}_{t}^{\alpha } } \right)^{2} } \right),$$

(22)

where $Y_{t}$ is the observed return, $\hat{Y}_{t}^{\alpha }$ is the estimated VaR, and $I\left( \cdot \right)$ is an indicator function.

(6) Assessment of ES forecasts

To evaluate ES forecasts, this study implemented six ES backtesting tests: conditional ($ES_{C}$), unconditional ($ES_{UC}$) (Du and Escanciano 2017), minimally biased absolute (MBA), minimally biased relative (MBR) (Acerbi and Szekely 2014, 2017), N&Z (Nolde and Ziegel 2017), and M&F tests (McNeil and Frey 2000).

The hypothesis test form of $ES_{UC}$ is $H_{0} :P_{t}^{\left[ \alpha \right]} = F_{t}^{\left[ \alpha \right]} ,\forall t$; $H_{1} :ES_{\alpha ,t}^{F} \ge ES_{\alpha ,t}^{{}}$, for all t and $ES_{\alpha ,t}^{F} > ES_{\alpha ,t}^{{}}$ for some t; $VaR_{\alpha ,t}^{F} = VaR_{\alpha ,t}^{{}}$, for all t, where $P_{t}^{\left[ \alpha \right]}$ is the distribution of the forecasting model, $F_{t}^{\left[ \alpha \right]}$ is a real (unknowable) distribution of assets, and $ES_{\alpha ,t}^{F}$ and $VaR_{\alpha ,t}^{F}$ indicate the risk measure value along a F distribution. The hypothesis test form of $ES_{C}$ is $H_{0} :P_{t}^{\left[ \alpha \right]} = F_{t}^{\left[ \alpha \right]} ,\forall t$; $H_{1} :ES_{\alpha ,t}^{F} \ge ES_{\alpha ,t}^{{}}$, for all t and $ES_{\alpha ,t}^{F} > ES_{\alpha ,t}^{{}}$ for some t; $VaR_{\alpha ,t}^{F} \ge VaR_{\alpha ,t}^{{}}$, for all t. The other four hypothesis test forms were adopted unaltered from the original literature.

Flow of the proposed model

The primary goal of this study was to devise a compound model for VaR and ES forecasts. QRMogLSTM and QRMogGRU were implemented on PyTorch, whereas LSGAN was built on TensorFlow. The sequence flow of the proposed model is presented in Algorithm 1 and the overall process is as follows:

Step 1: Divide the stock market index datasets into training, validation, and testing sets. The training and validation sets are used as in-sample data, whereas the testing set is used as out-of-sample data. A more detailed division of the data is presented in "Data" section.

Step 2: Using the subseries generation mechanism introduced in "The hypothesis of a heterogeneous market" section, learn the decomposition and aggregation rules on the training set and generate subseries according to the rules for every time data outside the training set are obtained.

Step 3: Estimate the quantiles of the subseries using the QRMogRNNs based on the Bayesian hyperparameter optimization and model selection in "Quantile regression Mogrifier RNNs (QRMogRNNs)" section. After that, sum up the subseries estimation values to obtain the final VaR estimation results for the stock market indices.

Step 4: Model the estimated quantiles directly using the LSGAN such that the generator and discriminator are trained by a zero-sum game, and the ES is estimated according to the rolling estimation mechanism.

Step 5: Backtest the VaR and ES estimation results, including the five VaR tests, risk scenario generation evaluation, six ES tests, and two joint scoring functions.

It should be noted that since ES can only be jointly elicitable with VaR, the ES estimation method in this study also relies on VaR estimation; thus, there is an inherent inconvenience in the training of the model. Other ES estimation studies have encountered the same challenge. For example, Taylor (2019) estimated VaR values based on the CAViaR-type model and then used the VaR values within a rolling window to estimate the parameters of the formula capable of deriving ES values. This method also requires obtaining the next-period VaR estimated value $VaR_{t + 1}$ and then using a rolling window to select the VaR values as the training data to estimate the parameters of the jointly elicitable formula to obtain the next-period ES estimated value. In this study, the rolling window method was used to train the LSGAN and forecast the VaR. The $VaR_{t + 1}$ is estimated first, and then the rolling window is employed to select the VaR sequence to be inputted into the LSGAN as the training data. The ES value in the next period is estimated based on $VaR_{t + 1}$ and the risk scenarios generated by the trained LSGAN.

Empirical study

Data

We took a sample of 1,683 weekly log-return from RESSET for the FTSE100, N225, SPX500, and DAX stock market indexes. The samples ranged from January 5, 1990, to April 1, 2022. In the VaR estimation, we used the first 1,262 data points as in-sample data to learn the decomposition-aggregation rules, train QRMogLSTM and QRMogGRU, and divide the validation set for Bayesian hyperparameter optimization and model selection. The final 421 data points were used as out-of-sample data for the backtesting procedure. In the ES estimation, we used the estimated 1,262 in-sample VaR values to pre-train LSGAN and the last 421 data points as out-of-sample data for the backtesting procedure. In this study, the pretraining epoch $E_{1}$ was 1000, the rolling window size was $b = 100$, and the model was extra-trained with additional $E_{2} = 500$ epochs for every $f = 10$ period. Figure 4a–d show the division of the dataset and the purpose of each part schematically from four aspects. The experiment was implemented on a personal computer with an AMD Ryzen 5 5600H six-core processor, with Radeon Graphics 3.30 GHz, 16 GB RAM, and a single NVIDIA GeForce GTX 1650 GPU.

Table 2 presents the summary statistics, and the results confirm the prevalence of financial asset characteristics such as high kurtosis and fat tails. In addition, the log-return of all indices has negative skewness; the null hypothesis of the normal distribution of the Jarque–Bera test is rejected, and the null hypothesis of the existence of the unit root of the Augmented Dickey-Fuller test is also rejected. These results drove us to employ QR combined with deep learning to predict the quantiles and LSGAN to capture the tail uncertainty information of the assets.

Table 2 Summary statistics of the weekly log-return of the four indexes

Full size table

Out-of-sample VaR forecasting

In this section, the proposed model is compared with key benchmarks. The statistics in "Evaluation methodology" section are used to backtest the VaR estimation results at two probability levels, namely, $\tau = 0.05$ and $\tau = 0.01$. This study includes 14 benchmarks: (1) Historical Simulation (Hist); (2) Normal distribution method (Normal); (3) CAViaR series models: CAViaR-SAV, CAViaR-AS, CAViaR-IGARCH, CAViaR-Adaptive; (4) QRNN; (5) LASSO-QR; (6) QR-random forest (QR-RF) and QR-gradient boosting decision trees (QR-GBDT); (7) QR-SVM; (8) QR-convolutional neural networks (QR-CNN); (9) QR-LSTM and QRGRU. The CAViaR-type models and QR-SVM were implemented in R, and the other models were implemented in Python.

Before the backtesting procedure, we analyzed the results of the hyperparameter determination and model selection, and the results are presented in Table 3.

Table 3 Hyperparameter values determined by Bayesian optimization

Full size table

(1) From the perspective of model selection, r is always greater than 0, which indicates that QRMogLSTM or QRMogGRU is superior to the naive QRLSTM or QRGRU; On the other hand, the optimal models are different on a specific subset of a specific dataset, which is consistent with the "No free lunch" theorem, which shows the importance of selecting models based on different datasets.

The data frequency in this study was weekly, between daily and monthly, and the data volume was relatively moderate. Therefore, the number of times the LSTM was selected was slightly higher than that of the GRU. Suppose that we model the data at a higher frequency (daily). In this case, the dataset would be larger, the LSTM with more parameters would be fully optimized, and the performance of the LSTM would be expected to improve. In contrast, because the GRU has fewer parameters to train, if we model lower-frequency (monthly) data, the learning process of the GRU converges more easily than that of the LSTM, and its performance may be better.

(2) There are many cases where the number of optimal model layers is three or four, and the number of hidden layer units is more than 10, which shows that QR combined with a complicated deep neural network can improve the prediction performance.

(3) From S1 to S4, the number of optimal feature inputs to the proposed model decreased. S1 is a low-frequency sequence and the proposed model requires additional lag features. As a high-frequency sequence, S4 always takes one as its optimal feature, indicating that it is more susceptible to short-term impacts. Furthermore, when $\tau = 0.01$ is used, the optimal feature number is always between one and two, which means that more attention should be paid to the impact of short-term shocks when measuring risk with a higher confidence level.

The next section analyzes the performance of the proposed model and key benchmarks in the real market; the evaluation results of the out-of-sample VaR forecasts of the four tests are presented in Tables 4, 5, 6 and 7. The backtesting results for the FTSE100 index are presented in Table 4, the N225 index in Table 5, the SPX500 index in Table 6 and the DAX index in Table 7.

Table 4 Out-of-sample VaR forecast evaluation of FTSE100

Full size table

Table 5 Out-of-sample VaR forecast evaluation of N225

Full size table

Table 6 Out-of-sample VaR forecast evaluation of SPX500

Full size table

Table 7 Out-of-sample VaR forecast evaluation of DAX

Full size table

The first objective was to compare our model with competing benchmarks using five VaR evaluation measures. To facilitate a comparison of the backtesting results of the different models, we report the number of test rejections and average M-Loss values in Table 8.

Table 8 Summary of the backtesting results for each probability level for the stock indices. Number of test rejections for the four indexes

Full size table

Looking at the evaluation results for all four indices at both quantile levels, we find that the proposed model is successfully backtested at the 1% significance level and 5% confidence level, except for the SPX500 index at the $\tau = 0.01$ level. Our model obtained the minimum value of the average M-Loss, meaning that the magnitude of extreme events beyond the forecast was small. Other benchmarks have significantly fewer successful backtesting scenarios, mainly because they cannot pass the DQ test at $\tau = 0.01$, and there are cases of risk overestimation or underestimation. The advantage of our model is also reflected in the VaR backtesting results at $\tau = 0.01$, which can be used to assess risk more accurately. Therefore, our model does not significantly underestimate extreme losses. In addition, it is worth noting that the CAViaR-AS model also outperforms other CAViaR models in terms of forecast performance, which is consistent with the findings of Merlo et al. (2021).

The above analysis, based on typical tests, shows that the proposed model can forecast risk accurately in more scenarios than the other 14 benchmarks. We also found evidence of this in the sequence diagrams. To obtain more explicit images of how different models differ in their risk forecasts, we plot the series for only the three models that performed better in the above tests: CAViaR-AS, QRNN, and our proposed model. The circles in the plot represent the actual returns, with yellow and green representing positive and negative returns, respectively; the red circles represent the actual returns, where the losses exceed the forecasts of the proposed model. The analysis in Fig. 5 reveals that the risk forecast curve of CAViaR-AS is smoother, whereas those of the proposed model and QRNN are volatile. Moreover, the difference in performance between the models is not significant during periods of stable stock prices, whereas it is more significant in the aftermath of major financial crises such as the Chinese stock market crash, Brexit, and the breakout of COVID-19. In March 2020 (enlarged area in the figure), the global stock market showed violent fluctuations owing to the COVID-19 pandemic, and stock prices fell sharply. The excess red circles also appear more intensively during this period, which indicates that the forecasting model does not accurately forecast sudden risks. Nevertheless, the difference between the actual and forecasted values of the proposed model was not significant, whereas the other models were more likely to overestimate the risk at $\tau = 0.01$ and underestimate it at $\tau = 0.05$, suggesting that our model is still significantly more accurate.

In addition to estimation accuracy, we also need to consider the time required to generate a VaR forecasting result, as our forecasting framework includes time-consuming tasks such as hyperparameter optimization, model selection, and deep learning training. Among these, BO is the most time-consuming process. We performed a BO process with 50 iterations for each of the four subseries, each subseries took 196 s, and the time consumption for calculating the four components was 784 s. Nonetheless, BO determines the structure of our VaR estimation model and the corresponding parameter values over a period; therefore, we do not need to perform BO for every forecast. After the BO, the training time of our model was relatively short. It only took 1.84 s to obtain the VaR value of a subseries; therefore, a total of 7.36 s were required to obtain a forecasted VaR value. This time consumption is relatively short; therefore, our model can also be used for daily VaR estimations.

Finally, ablation studies were performed to evaluate the effects of the decomposition and aggregation technique (DA), BO, and Mogrifier structure (Mog) on the proposed model. We report the number of test rejections and average M-Loss values for the four indices. The results are summarized in Table 9. It can be observed that DA, BO, and Mog significantly improve the proposed model. In particular, DA preprocessing techniques and the Mog structure are more important than the BO. The proposed model, without hyperparameter optimization and model selection, can also achieve better risk-forecasting results than the CAViaR-type models and QRNN.

Table 9 Summary of the backtesting results for ablation studies

Full size table

Out-of-sample ES forecasting

This section evaluates the out-of-sample ES forecasting results. The risk scenarios generated by the LSGAN are assessed in "Evaluation of risk scenario generation results" section. Furthermore, the out-of-sample ES forecasting results are evaluated using statistical tests and scoring functions, as presented in "Backtesting ES with statistical tests and joint scoring functions" section.

Evaluation of risk scenario generation results

After pretraining, the LSGAN performs additional training based on the latest 100 VaR historical data at the 95% probability level and generates 500 quantile scenarios for out-of-sample ES estimation. The model architecture and parameter settings of the LSGAN designed in this study are listed in Table 10.

Table 10 List of LSGAN parameter

Full size table

The evaluation of scenario generation techniques commonly encompasses three primary categories (Li et al. 2020): (1) output-based evaluation, which leverages error metrics such as the mean squared error to gauge performance; (2) distribution-based evaluation, entailing assessment via computations of the energy score; and (3) event-based evaluation, encompassing metrics such as coverage rate and correlation coefficients. We carefully selected the coverage rate (CR) and correlation coefficients (CC) of the event-based evaluation method for the initial evaluation of the scenario generation results. This is because using other evaluation methods requires obtaining actual observations, which do not exist for risk measures (i.e., VaR and ES). To calculate the ES value at a certain significance level based on the risk scenarios, it is necessary to ensure that the ES value is lower than the corresponding VaR value at that significance level. Therefore, the CR defined in this study represents the probability that the VaR values are present within a set of generated scenarios. A higher CR value indicates that the generated scenarios are more likely to represent tail risk, indicating that the generated scenarios are more reliable. The formula for the CR (Wang et al. 2017) is

$$C\left( \tau \right) = \frac{1}{T}\sum\limits_{{{\text{t}} = 1}}^{T} {I\left( {P_{t}^{\min } \le VaR_{t} \left( \tau \right)} \right)}$$

(23)

$$P_{t}^{\min } = \min \left( {P_{s,t} } \right),{\kern 1pt} \;s = 1,\,2, \ldots ,\,S$$

(24)

where $C\left( \tau \right)$ represents the CR with respect to $VaR_{t} \left( \tau \right)$ and $I\left( \cdot \right)$ represents an indicator function. $P_{s,t}$ represents the s-th scenario value at t time and S represents the number of scenarios.

According to previous studies (Ma et al. 2016; Garatti et al. 2019), we can also use correlation coefficients to measure the similarity in the dynamics between VaR and the generated risk scenarios. We compared the LSGAN model with three commonly used scenario generation methods: the Gaussian distribution (GD), kernel density estimation (KDE), and Markov chain Monte Carlo (MCMC) methods. The evaluation results of the risk scenarios generated using the four methods are listed in Table 11.

Table 11 Comparision of models using the coverage rate and correlation coefficients

Full size table

Based on the evaluation results presented in Table 11, it is evident that the risk scenarios produced by the LSGAN exhibit a higher probability of encompassing the VaR values and demonstrate a stronger correlation with the VaR dynamics at both significance levels. This observation suggests that utilizing risk scenarios generated by the LSGAN for calculating ES is in line with market risk theory and yields more accurate results than other benchmarks.

Furthermore, as depicted in Fig. 6, the risk scenarios for the four indices were generated using the LSGAN. The 95% probability-level VaR forecasts from QRMogRNNs are denoted by a dashed black line. It can be seen that LSGAN achieves accurate sampling of the tail return distribution beyond VaR, and ensures the diversity and dynamism of the generated risk scenarios. The probability density curves in Fig. 7 depict the distribution of the in-sample 95% VaR data versus the 500 scenarios generated by the pretrained LSGAN. By observing the distribution fit, it was found that the LSGAN could learn the characteristics of historical data well and generate reliable synthetic data. The goodness-of-fit between the probability density function (PDF) of the risk scenarios generated by the LSGAN and the PDF of the in-sample VaR values was 0.8882 for FTSE, 0.9298 for N225, 0.8874 for SPX, and 0.9018 for DAX.

Backtesting ES with statistical tests and joint scoring functions

In this section, we verify the performance of the model in terms of ES forecasts, which is a backtesting procedure with six statistical tests and two joint scoring functions. This proved to be effective if the proposed model was successfully tested in more situations. Table 12 presents the p values of the resulting test statistics; the null hypotheses are not rejected in all situations for all desired levels. This shows that the ES estimation approach based on LSGAN scenario generation is effective and provides a new tool for estimating the ES in tail risk management.

Table 12 Out-of-sample ES forecast evaluation by using statistical tests

Full size table

For comparison with ES forecasts produced by other commonly used scenario generation methods, Table 13 reports the values of the joint scoring functions S_FZN (Nolde and Ziegel 2017) and S_FZ0 (Patton et al. 2019) averaged over the out-of-sample period. A lower scoring function value represents a more accurate ES estimation. The ES estimation process based on the scenario generation of the GD, KDE, and MCMC was the same as that of the LSGAN-based method. The evaluation results of the two joint scoring functions in Table 13 show that the ES estimation results based on the LSGAN scenario generation are more accurate than those based on other scenario generation methods (i.e., GD, KDE, and MCMC).

Table 13 Out-of-sample ES forecast evaluation using joint scoring functions

Full size table

To offer graphical intuition to support the results, Fig. 8 presents the results of ES out-of-sample estimation graphically. By analyzing Fig. 8, we find that the estimated ES is relatively stable during calm periods and larger during periods of turbulent markets, with a better ability to capture extreme risks. The high volatility of ES is obvious during recessions and major economic and financial crises, such as the Chinese stock market crash in 2016, Brexit in 2018, and the COVID-19 outbreak in 2020. Moreover, during the COVID-19 outbreak, the ES values estimated using the proposed model were greater than those after other major adverse events. This shows that the proposed model can accurately estimate the ES and provide efficient guidance to financial institutions and investors regarding asset allocation.

The training and forecasting of our ES estimation framework do not consume much time. Before out-of-sample forecasting, the LSGAN requires a pretraining process that takes 150 s. In out-of-sample forecasting, we conducted extra training on the LSGAN every ten periods, and the time required to obtain the ES values of the ten periods was 510 s. Therefore, our estimation model can also be used to forecast daily ES.

Summary and conclusions

Finance and economics scholars have long explored better methods for estimating VaR and ES. The structure of the proposed models is often related to the properties of the financial assets. This study follows this research direction and proposes an estimation framework that combines decomposition-aggregation learning with MogRNNs. The estimation framework is consistent with market heterogeneity theory and the properties of asset volatility. MogLSTM and MogGRU can better capture the "long memory" and "clustering" of financial assets by their particular cell structure and the interactive operation between the previous hidden state and the current input. However, these two models have not been extended to predict financial uncertainty. This study proposes combining the above two models with QR to estimate VaR and adds BO to improve practicability. The backtesting results indicate that the model produces reliable VaR estimates. Furthermore, to implement ES estimation, this study proposes a new estimation method using LSGAN to model the distribution of quantiles and generate possible future scenarios of downside risk at a specific probability level. This study also provides a compromise between direct and decomposition forecasts, which balances forecasting accuracy and computational burden. Taking four crucial stock market indices as research objects, five VaR backtesting tests, six ES backtesting tests, and two scoring functions show that the proposed model can forecast risks more accurately than 14 popular benchmarks. We conclude that the proposed model is a promising modeling framework for forecasting risk and is worthy of further study to expand its application.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

CAViaR:: Conditional autoregressive value at risk
DQ:: Dynamic quantile
ES:: Expected shortfall
EMD:: Empirical mode decomposition
GANs:: Generative adversarial networks
GRU:: Gated recurrent unit
HR:: Hit ratio
HS:: Historical simulation
IMFs:: Intrinsic mode functions
LASSO:: Least absolute shrinkage and selection operator
LSGAN:: Least squares GAN
LSTM:: Long short-term memory network
Mog:: Mogrifier
MBA:: Minimally biased absolute
MBR:: Minimally biased relative
QR:: Quantile regression
QR-GBDT:: Quantile regression gradient boosting decision trees
QRNN:: Quantile regression neural network
QR-RF:: Quantile regression random forest
QR-SVM:: Quantile regression support vector machine
QRMogRNNs:: Quantile regression Mogrifier recurrent neural networks
RNNs:: Recurrent neural networks
VaR:: Value at risk
VMD:: Variational mode decomposition

References

Acerbi C, Szekely B (2014) Backtesting expected shortfall. Risk Mag 27:1–6
Google Scholar
Acerbi C, Szekely B (2017) General properties of backtestable statistics. SSRN Electron J. https://doi.org/10.2139/ssrn.2905109
Article Google Scholar
Acerbi C, Tasche D (2002) On the coherence of expected shortfall. J Bank Finance. https://doi.org/10.1016/S0378-4266(02)00283-2
Article Google Scholar
Artzner P, Delbaen F, Eber JM, Heath D (1999) Coherent measures of risk. Math Finance. https://doi.org/10.1111/1467-9965.00068
Article Google Scholar
Bertsekas DP (1982) Constrained optimization and Lagrange multiplier methods. Academic Press, New York
Google Scholar
Cai Z, Wang X (2008) Nonparametric estimation of conditional VaR and expected shortfall. J Econom. https://doi.org/10.1016/j.jeconom.2008.09.005
Article Google Scholar
Chang YP, Hung MC, Wu YF (2003) Nonparametric estimation for risk in value-at-risk estimator. Commun Stat Part B Simul Comput. https://doi.org/10.1081/SAC-120023877
Article Google Scholar
Christoffersen PF (1998) Evaluating interval forecasts. Int Econ Rev (philadelphia). https://doi.org/10.2307/2527341
Article Google Scholar
Corsi F (2009) A simple approximate long-memory model of realized volatility. J Financ Econom. https://doi.org/10.1093/jjfinec/nbp001
Article Google Scholar
Deng C, Huang Y, Hasan N, Bao Y (2022) Multi-step-ahead stock price index forecasting using long short-term memory model with multivariate empirical mode decomposition. Inf Sci (NY) 607:297–321. https://doi.org/10.1016/J.INS.2022.05.088
Article Google Scholar
Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Process. https://doi.org/10.1109/TSP.2013.2288675
Article Google Scholar
Du Z, Escanciano JC (2017) Backtesting expected shortfall: accounting for tail risk. Manag Sci. https://doi.org/10.1287/mnsc.2015.2342
Article Google Scholar
Emenogu NG, Adenomon MO, Nweze NO (2020) On the volatility of daily stock returns of Total Nigeria Plc: evidence from GARCH models, value-at-risk and backtesting. Financ Innov. https://doi.org/10.1186/s40854-020-00178-1
Article Google Scholar
Ener E, Baronyan S, Ali Mengütürk L (2012) Ranking the predictive performances of value-at-risk estimation methods. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2011.10.002
Article Google Scholar
Engle RF, Manganelli S (2004) CAViaR: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 22:367–381. https://doi.org/10.1198/073500104000000370
Article Google Scholar
Fatouros G, Makridis G, Kotios D et al (2022) DeepVaR: a framework for portfolio risk assessment leveraging probabilistic deep neural networks. Digit Finance. https://doi.org/10.1007/s42521-022-00050-0
Article Google Scholar
Fissler T, Ziegel JF, Gneiting T (2015) Expected shortfall is jointly elicitable with value at risk—implications for backtesting, pp 1–7
Fu W, Wang K, Tan J, Zhang K (2020) A composite framework coupling multiple feature selection, compound prediction models and novel hybrid swarm optimizer-based synchronization optimization strategy for multi-step ahead short-term wind speed forecasting. Energy Convers Manag. https://doi.org/10.1016/j.enconman.2019.112461
Article Google Scholar
Garatti S, Ming H, Xie L et al (2019) Scenario-based economic dispatch with uncertain demand response. IEEE Trans Smart Grid. https://doi.org/10.1109/TSG.2017.2778688
Article Google Scholar
Gneiting T (2011) Making and evaluating point forecasts. J Am Stat Assoc. https://doi.org/10.1198/jasa.2011.r10138
Article Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems
Grabchak M, Christou E (2021) A note on calculating expected shortfall for discrete time stochastic volatility models. Financ Innov. https://doi.org/10.1186/s40854-021-00254-0
Article Google Scholar
Guo J, Zhao Z, Sun J, Sun S (2022) Multi-perspective crude oil price forecasting with a new decomposition-ensemble framework. Resour Policy 77:102737. https://doi.org/10.1016/J.RESOURPOL.2022.102737
Article Google Scholar
Heusel M, Ramsauer H, Unterthiner T, et al (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in neural information processing systems
Huang AYH (2013) Value at risk estimation by quantile regression and kernel estimator. Rev Quant Finance Account 41:225–251. https://doi.org/10.1007/s11156-012-0308-x
Article Google Scholar
Huang NE, Shen Z, Long SR et al (1996) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond Ser A Math Phys Eng Sci 454:903–995
Article Google Scholar
Jiang P, Liu Z, Wang J, Zhang L (2021) Decomposition-selection-ensemble forecasting system for energy futures price forecasting based on multi-objective version of chaos game optimization algorithm. Resour Policy 73:102234. https://doi.org/10.1016/j.resourpol.2021.102234
Article Google Scholar
Kim J, Yu J, Kang C et al (2022) A novel hybrid water quality forecast model based on real-time data decomposition and error correction. Process Saf Environ Prot 162:553–565. https://doi.org/10.1016/J.PSEP.2022.04.020
Article Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica. https://doi.org/10.2307/1913643
Article Google Scholar
Kupiec PH (1995) Techniques for verifying the accuracy of risk measurement models. J Deriv. https://doi.org/10.3905/jod.1995.407942
Article Google Scholar
Kwon JH (2021) On the factors of Bitcoin’s value at risk. Financ Innov. https://doi.org/10.1186/s40854-021-00297-3
Article Google Scholar
Li Y, Liu Y, Zhu J (2007) Quantile regression in reproducing kernel Hilbert spaces. J Am Stat Assoc 102:255–268. https://doi.org/10.1198/016214506000000979
Article Google Scholar
Li J, Zhou J, Chen B (2020) Review of wind power scenario generation methods for optimal operation of renewable energy systems. Appl Energy. https://doi.org/10.1016/j.apenergy.2020.115992
Article Google Scholar
Liang J, Tang W (2020) Sequence generative adversarial networks for wind power scenario generation. IEEE J Sel Areas Commun. https://doi.org/10.1109/JSAC.2019.2952182
Article Google Scholar
Lopez JA (1999) Methods for evaluating value-at-risk estimates. Econ Rev Fed Reserv Bank San Fran 2:3–17
Google Scholar
Ma XY, Sun YZ, Fang HL (2013) Scenario generation of wind power based on statistical uncertainty and variability. IEEE Trans Sustain Energy. https://doi.org/10.1109/TSTE.2013.2256807
Article Google Scholar
Ma R, Xu W, Liu S et al (2016) Asymptotic mean and variance of Gini correlation under contaminated Gaussian model. IEEE Access. https://doi.org/10.1109/ACCESS.2016.2622358
Article Google Scholar
Mao X, Li Q, Xie H, et al (2017) Least squares generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, 2017-October, pp 2813–2821. https://doi.org/10.1109/ICCV.2017.304
McNeil AJ, Frey R (2000) Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. J Empir Finance 7:271–300. https://doi.org/10.1016/S0927-5398(00)00012-8
Article Google Scholar
Melis G, Kočiský T, Blunsom P (2019) Mogrifier LSTM, pp 1–13
Meng X, Taylor JW (2020) Estimating value-at-risk and expected shortfall using the intraday low and range data. Eur J Oper Res. https://doi.org/10.1016/j.ejor.2019.07.011
Article Google Scholar
Merlo L, Petrella L, Raponi V (2021) Forecasting VaR and ES using a joint quantile regression and its implications in portfolio allocation. J Bank Financ 133:106248. https://doi.org/10.1016/j.jbankfin.2021.106248
Article Google Scholar
Müller UA, Dacorogna MM, Dave R et al (1993) Fractals and intrinsic time—a challenge to econometricians. Social Science Electronic Publishing, New York
Google Scholar
Neshat M, Nezhad MM, Sergiienko NY et al (2022) Wave power forecasting using an effective decomposition-based convolutional bi-directional model with equilibrium Nelder–Mead optimiser. Energy 256:124623. https://doi.org/10.1016/J.ENERGY.2022.124623
Article Google Scholar
Nguyen LH, Chevapatrakul T, Yao K (2020) Investigating tail-risk dependence in the cryptocurrency markets: a LASSO quantile regression approach. J Empir Finance 58:333–355. https://doi.org/10.1016/j.jempfin.2020.06.006
Article Google Scholar
Nolde N, Ziegel JF (2017) Elicitability and backtesting: perspectives for banking regulation. Ann Appl Stat. https://doi.org/10.1214/17-AOAS1041
Article Google Scholar
Parvini N, Abdollahi M, Seifollahi S, Ahmadian D (2022) Forecasting Bitcoin returns with long short-term memory networks and wavelet decomposition: a comparison of several market determinants. Appl Soft Comput 121:108707. https://doi.org/10.1016/J.ASOC.2022.108707
Article Google Scholar
Patton AJ, Ziegel JF, Chen R (2019) Dynamic semiparametric models for expected shortfall (and value-at-risk). J Econom. https://doi.org/10.1016/j.jeconom.2018.10.008
Article Google Scholar
PH H, Rishad A (2020) An empirical examination of investor sentiment and stock market volatility: evidence from India. Financ Innov. https://doi.org/10.1186/s40854-020-00198-x
Article Google Scholar
Qin T (2020) Stock movement classification from twitter via Mogrifier based memory cells with attention mechanism. In: ACM international conference proceeding series
Qiu M, Song Y (2016) Predicting the direction of stock market index movement using an optimized artificial neural network model. PLoS ONE. https://doi.org/10.1371/journal.pone.0155133
Article Google Scholar
Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J Risk. https://doi.org/10.21314/jor.2000.038
Article Google Scholar
Takeuchi I, Le QV, Sears TD, Smola AJ (2006) Nonparametric quantile estimation. J Mach Learn Res 7:1231–1264
Google Scholar
Taylor JW (2000) A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J Forecast. https://doi.org/10.1002/1099-131x(200007)19:4%3c299::aid-for775%3e3.3.co;2-m
Article Google Scholar
Taylor JW (2019) Forecasting value at risk and expected shortfall using a semiparametric approach based on the asymmetric Laplace distribution. J Bus Econ Stat 37:121–133. https://doi.org/10.1080/07350015.2017.1281815
Article Google Scholar
Wang X, Hu Z, Zhang M, Hu M (2017) Research on establishment of quality evaluation framework of short-term wind power scenarios. Dianwang Jishu/power Syst Technol. https://doi.org/10.13335/j.1000-3673.pst.2016.1985
Article Google Scholar
Wang K, Fu W, Chen T et al (2020) A compound framework for wind speed forecasting based on comprehensive feature selection, quantile regression incorporated into convolutional simplified long short-term memory network and residual error correction. Energy Convers Manag 222:113234. https://doi.org/10.1016/j.enconman.2020.113234
Article Google Scholar
Wang J, Cui Q, Sun X, He M (2022a) Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based LSTM model. Eng Appl Artif Intell 113:104908. https://doi.org/10.1016/J.ENGAPPAI.2022.104908
Article Google Scholar
Wang J, Wang S, Zeng B, Lu H (2022b) A novel ensemble probabilistic forecasting system for uncertainty in wind speed. Appl Energy 313:118796. https://doi.org/10.1016/j.apenergy.2022.118796
Article Google Scholar
Wang J, Zhang L, Liu Z, Niu X (2022c) A novel decomposition-ensemble forecasting system for dynamic dispatching of smart grid with sub-model selection and intelligent optimization. Expert Syst Appl 201:117201. https://doi.org/10.1016/J.ESWA.2022.117201
Article Google Scholar
Wang Z, Li H, Chen H et al (2022d) Linear and nonlinear framework for interval-valued PM2.5 concentration forecasting based on multi-factor interval division strategy and bivariate empirical mode decomposition. Expert Syst Appl 205:117707. https://doi.org/10.1016/J.ESWA.2022.117707
Article Google Scholar
White H (1992) Nonparametric estimation of conditional quantiles using neural networks. In: Computing science and statistics
Xu Q, Jiang C, He Y (2016) An exponentially weighted quantile regression via SVM with application to estimating multiperiod VaR. Stat Methods Appl 25:285–320. https://doi.org/10.1007/s10260-015-0332-9
Article Google Scholar
Yuan R, Wang B, Mao Z, Watada J (2021) Multi-objective wind power scenario forecasting based on PG-GAN. Energy. https://doi.org/10.1016/j.energy.2021.120379
Article Google Scholar
Zhang X, Lai KK, Wang SY (2008) A new approach for crude oil price analysis based on empirical mode decomposition. Energy Econ. https://doi.org/10.1016/j.eneco.2007.02.012
Article Google Scholar
Zhu Y, Mariani G, Li J (2021) Pagan: portfolio analysis with generative adversarial networks. SSRN Electron J. https://doi.org/10.2139/ssrn.3755355
Article Google Scholar
Žiković S, Filer RK (2013) Ranking of VaR and ES models: performance in developed and emerging markets. Financ a Uver - Czech J Econ Financ. https://doi.org/10.2139/ssrn.2171673
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Managing Editor-in-Chief, Professor Gang Kou and the anonymous referees for their constructive comments and suggestions which helped us to improve the manuscript.

Funding

This research was supported by the Jiangxi Provincial Natural Science Foundation (20212ACB211003) and the National Natural Science Foundation of China (No. 71671029).

Author information

Authors and Affiliations

School of Statistics, Dongbei University of Finance and Economics, Dalian, China
Jianzhou Wang, Shuai Wang & Mengzheng Lv
School of Statistics, Jiangxi University of Finance and Economics, Nanchang, 330013, China
He Jiang

Authors

Jianzhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengzheng Lv
View author publications
You can also search for this author in PubMed Google Scholar
He Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JW: conceptualization, supervision. SW: writing-review and editing, software, methodology. ML: validation, visualization. HJ: formal analysis, validation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shuai Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Wang, S., Lv, M. et al. Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis. Financ Innov 10, 36 (2024). https://doi.org/10.1186/s40854-023-00564-5

Download citation

Received: 26 August 2022
Accepted: 13 November 2023
Published: 07 January 2024
DOI: https://doi.org/10.1186/s40854-023-00564-5

Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis

Abstract

Introduction

Related work

VaR and ES estimation framework

The hypothesis of a heterogeneous market

Decomposition and aggregation based on investor heterogeneity

Variational mode decomposition

IMFs aggregation based on fuzzy entropy

Real-time decomposition and aggregation mechanism

Quantile regression Mogrifier RNNs (QRMogRNNs)

Mogrifier LSTM and Mogrifier GRU

Properties of financial asset

Heterogeneous market hypothesis

QRMogRNNs

Bayesian hyperparameter optimization and model selection

Estimate ES based on scenario generation

Least squares generative adversarial network

Generating the future risk scenarios using LSGAN

Forecasting ES based on risk scenarios and estimated VaR

Evaluation methodology

Flow of the proposed model

Empirical study

Data

Out-of-sample VaR forecasting

Out-of-sample ES forecasting

Evaluation of risk scenario generation results

Backtesting ES with statistical tests and joint scoring functions

Summary and conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords