Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis

Value at risk (VaR) and expected shortfall (ES) have emerged as standard measures for detecting the market risk of financial assets and play essential roles in investment decisions, external regulations, and risk capital allocation. However, existing VaR estimation approaches fail to accurately reflect downside risks, and the ES estimation technique is quite limited owing to its challenging implementation. This causes financial institutions to overestimate or underestimate investment risk and finally leads to the inefficient allocation of financial resources. The main purpose of this study is to use machine learning to improve the accuracy of VaR estimation and provide an effective tool for ES estimation. Specifically, this study proposes a VaR estimator by combining quantile regression with “Mogrifier” recurrent neural networks to capture the “long memory” and “clustering” properties of financial assets; while for estimating ES, this study directly models the quantile of assets and employs generative adversarial networks to generate future tail risk scenarios. In addition to the typical properties of financial assets, the model design is also consistent with heterogeneous market theory. An empirical application to four major global stock indices shows that our model is superior to other existing models.


Introduction
In the ongoing credit and financial crises, it is essential to manage risk using appropriate measurement tools.Value at risk (VaR) is a widely used risk measure in financial institutions owing to its easy calculation and clear definition.However, the drawbacks of VaR (Artzner et al. 1999;Kwon 2021) are apparent: (1) VaR does not measure the left-tail risk beyond the quantile at the desired level and (2) VaR is not a consistent risk measure and does not satisfy some desirable properties such as subadditivity and convexity.These drawbacks cause investors and risk managers to overestimate or underestimate risk.In the recent Basel Accords, expected shortfall (ES) replaced VaR as the standard measure of market risk, making it the most popular risk measure for financial institutions and investors (Acerbi and Tasche 2002).ES, in addition to having many other desirable properties, is a consistent risk measure defined as the conditional mean of the loss over VaR at a given confidence level (Rockafellar and Uryasev 2000).Although the ES employed in Basel III and IV can provide more information about the left tail of assets, estimation is inherently challenging, as ES is not elicitable, which means there exists no scoring function for which the expectation is minimized by the true ES (Gneiting 2011).
Classical approaches for forecasting VaR can be divided into three major categories: nonparametric, parametric, and semi-parametric.The nonparametric approach does not require assumptions regarding the distribution of returns.Historical simulation is the primary representative of this category, where the empirical distribution of historical returns is used to calculate VaR.Although the calculation complexity of this approach is relatively low, it cannot capture fluctuations that do not exist in the historical window used (Chang et al. 2003).To estimate VaR using a parametric approach, it is necessary to define an effective model of return distribution.Some well-known methods in this category are the variance-covariance model and many GARCH-type models.However, the distribution assumptions of this method (such as Gaussian and Student's t distributions) are not applicable to most financial time-series data.Semi-parametric approaches to VaR forecasting include those that use extreme value theory (Ener et al. 2012) and those that directly model the conditional quantile for a chosen probability level using quantile regression (QR), such as conditional autoregressive VaR (CAViaR) modeling (Engle and Manganelli 2004).In empirical studies on VaR forecast accuracy, the CAViaR models have performed well (Ener et al. 2012).Given the performance of QR in VaR estimation, scholars have combined it with machine learning methods, such as quantile regression neural networks (QRNN) and LASSO-QR, to further improve forecast accuracy.Moreover, there are inherently suitable methods for forecasting financial time series using deep learning such as recurrent neural networks (RNNs) and their variants.In particular, the MogLSTM proposed by Melis et al. (2019) has received widespread attention owing to the interaction operation between the hidden layer state and the current input first.We argue that through this interaction operation, the stored historical return information can be sufficiently combined with the currently available information to mimic long memory and nonlinear dependencies as well as the volatility clustering of returns.In addition, the forget gate in the MogLSTM cell can discard historical information that does not contribute to risk estimation, thereby avoiding serious clustered VaR violations.Therefore, we combined QR with a Mogrifier long short-term memory (MogLSTM) and Mogrifier gated recurrent unit (MogGRU) (Qin 2020) to propose two new deep learning QR (deep-QR) models: QRMogLSTM and QRMogGRU.For ease of expression, we call MogLSTM and MogGRU the Mogrifier recurrent neural networks (MogRNNs).
ES forecasts can be produced as byproducts of many VaR forecasting methods.Historical simulation and kernel density estimation in the nonparametric category can estimate ES by generating density forecasting.This is also the case for parametric approaches that involve a model for conditional variance, such as the GARCH model, and a distributional assumption.ES has always been regarded as not elicitable, whereas Fissler et al. (2015) show that VaR and ES are jointly elicitable.Therefore, another recently popular method is to jointly estimate VaR and ES based on the AL scoring function (Taylor 2019).Although machine learning methods can achieve accurate VaR estimations by combining QR, they provide no apparent means of producing ES forecasts.There is no scoring function for which the expectation is minimized by the true ES (Gneiting 2011) to construct supervised learning directly.In recent years, many data-driven methods based on Generative Adversarial Networks (GANs) (Goodfellow et al. 2014) have been utilized to generate renewable resource scenarios, particularly for wind power (see, Ma et al. 2013;Liang and Tang 2020;Yuan et al. 2021).Although the potential benefits of the physical process from wind energy to wind power generation are obvious, one of the main obstacles to its implementation is the uncertainty of predicting meteorological variables.Therefore, researchers have developed data-driven methods for GANs that can generate scenarios based on historical data related to meteorological variables, and achieve satisfactory results for uncertainty prediction.In the field of risk management, a few studies have applied GANs to capture the uncertainty of financial asset returns and prices and obtain VaR estimations (see, Zhu et al. 2021;Fatouros et al. 2022).Although these generative models accurately portray variable uncertainty without the need to fit probabilistic models of stochastic variables, they do not provide an efficient way to estimate ES.Inspired by the above generative model-based research, this study employs GANs to specifically model uncertainty scenarios in the left tail of asset returns, and accordingly proposes an ES estimation method based on scenario generation.Specifically, after generating tail risk scenarios, ES can be estimated by calculating the arithmetic average of the risk scenarios below the VaR.This ES estimation method based on GANs risk scenario generation, which is interesting and promising, has never been explored.
In addition to the aforementioned VaR and ES modeling methods, the hypothesis of a heterogeneous market (Müller et al. 1993) potentially contributes to the estimation of risk measures.Different participants in a heterogeneous market have different time horizons and dealing frequencies, and can have different degrees of risk aversion, institutional constraints, and transaction costs.The risk measures in a heterogeneous market can be affected by participants with different dealing frequencies.In the VaR and ES estimation processes, we consider the heterogeneous market hypothesis as a guiding theory in the design of our estimation framework.
According to the pertinent literature, most existing VaR and ES methods face various challenges.First, traditional methods and simple machine learning models find it difficult to capture long memory and nonlinear dependencies in financial time series, which leads to the low accuracy of these models in the estimation of VaR and ES.The second challenge is severe VaR violations during which the asset realizes a loss exceeding the VaR value owing to dependencies between VaR forecasts, especially for the 99% confidence level.This is often the case when sharp market plunges occur, especially for developed markets (Žiković and Filer 2013).Third, although machine learning methods can achieve accurate VaR estimations by combining QR, they provide no apparent means of producing ES forecasts.Finally, financial market risk can be affected by participants with different time horizons and dealing frequencies, and existing risk-forecasting methods often ignore the importance of these factors.These challenges of VaR and ES methods reflect additional motivations for the framework proposed in this study.
In summary, this study introduces a data-driven framework that forecasts the VaR and ES of assets, addressing the aforementioned challenges with the following key innovations: (1) Two VaR estimation methods based on deep learning: QRMogLSTM and QRMog-GRU.These two methods not only mimic long memory and nonlinear dependencies to capture rare market events, but also discard unimportant historical information to avoid clustered VaR violations.(2) An ES estimation model based on GANs to solve the problem in which QR-based machine learning methods have difficulty generating ES estimates.
(3) The use of the decomposition-aggregation mechanism to implement risk measure forecasting to capture the behavior of market participants with multiple time scales.Additionally, this study considers the heterogeneous market hypothesis as theoretical guidance in the design of the estimation model.

Related work
In finance, there is a growing interest in QR, with a particular focus on VaR models.There are two reasons for this: (1) QR provides a complete characterization of the random relationship between variables and (2) QR offers a more robust and thus more effective estimation in some non-Gaussian settings.The majority of the QR literature has focused on statistical models that generate future VaR in fixed model forms.For example, Koenker and Bassett (1978) propose a linear estimation process for conditional quantiles, extending the ordinary regression model by setting the loss function to quantile loss.Engle and Manganelli (2004) propose another quantile estimation method based on the linear QR technique to model the quantile directly, namely CAViaR.However, the return data of the real stock market are usually nonlinear and do not satisfy a normal distribution, and it is difficult to find suitable functional forms for the linear QR and CAViaR models (Huang 2013).To solve these estimation problems, Taylor (2000) applies the QRNN proposed by White (1992) to estimate conditional quantiles.The empirical results prove that the QRNN outperforms traditional GARCH-class models in terms of forecasting performance.The motivation for using QRNN to estimate VaR is clear: to find a suitable nonlinear functional form for the QR process with the powerful nonlinear mapping capability of artificial neural networks.Unfortunately, a QRNN is essentially a feedback-type neural network, and thus suffers from overfitting, underfitting, and a tendency to fall into local optimal points (Qiu and Song 2016).Inspired by Taylor, the following research idea is clear to scholars: extend the better-performing point forecasting model to the field of quantile estimation.For example, Takeuchi et al. (2006) and Li et al. (2007) proposed QR-SVM, and Xu et al. (2016) applied this method to VaR estimation and concluded that QR-SVM outperformed traditional GARCH-like and linear QR models.Nguyen et al. (2020) also propose the use of LASSO-QR to study tail risk in the cryptocurrency market.
With the development of deep learning, scholars have again turned their attention to deep neural networks, especially RNNs (Wang et al. 2022b) and their variants.For example, Wang et al. (2020) estimated quantiles using a QR long short-term memory network (LSTM) and constructed forecasting intervals from the upper and lower quantile estimation results.However, few studies have focused on deep-QR to estimate the VaR.This may be because the financial sequence is more volatile, resulting in complex neural networks that are difficult to train and need a large number of hyperparameters to be set appropriately.
ES is a standard risk measure in the recent Basel Accord, while its related modeling work is less.Mainly because ES is not an elicitable measure, constructing a loss function for its estimation process is challenging (Cai and Wang 2008;Grabchak and Christou 2021).Scholars such as Du and Escanciano (2017) and Patton et al. (2019) have explored ES estimation approaches.Recently, a popular method has been proposed to jointly estimate VaR and ES based on the research theories of Fissler et al. (2015).For example, Meng and Taylor (2020) and Merlo et al. (2021) jointly estimated VaR and ES using CAV-iaR-type models based on the AL scoring function.In addition, the current literature on risk measurement modeling using generative models focuses only on VaR estimation without providing ideas for ES estimation.For example, Zhu et al. (2021) and Fatouros et al. (2022) employed GANs to generate price/return scenarios for future assets and obtained VaR by calculating the quantiles of all the scenarios.These methods are the same as scenario generation in the field of renewable energy (Li et al. 2020) and generally follow two steps: (1) obtaining the probability distribution of the sequence itself or the forecasting error, and (2) sampling the scenarios from the statistical distribution.
In a broad strand of the financial literature, the hypothesis of a heterogeneous market mainly guides the forecasting of asset volatility, such as the heterogeneous autoregressive (HAR) model (Corsi 2009).Decomposition-aggregation forecasting is a method that captures multi-timescale behaviors for the prediction of financial asset returns.The application scenarios of decomposition-aggregation forecasting in finance include, but are not limited to futures (Jiang et al. 2021;Guo et al. 2022), stock (Deng et al. 2022;Wang et al. 2022a), and cryptocurrency (Parvini et al. 2022) markets.The research results in the above studies show that forecasts based on decomposition-aggregation learning can improve accuracy.Moreover, the main idea of decomposition-aggregation forecasting is compatible with the heterogeneous market hypothesis.
Based on the above, the main contribution of the current study is fourfold: First, we propose a novel probabilistic framework for VaR estimation based on QRMogLSTM and QRMogGRU.This framework performs better in terms of both VaR 95% and VaR 99% than prevalent VaR estimation methods.Second, we developed an ES estimation approach based on GANs, providing future risk scenarios based on the generation technology of tail distribution, and solving the problem of machine learning methods having difficulties in producing ES forecasts.Various evaluation indices and statistical tests prove the validity of the ES forecasting approach.Third, we consider investors' different time horizons and dealing frequencies based on the heterogeneous market hypothesis.It captures the investment behaviors of short-, medium-, and long-term investors and does not significantly increase the computational burden.Finally, we explore a relatively complete QR model space using CAViaR-type models, QRNN, LASSO-QR, QR-SVM, QR-tree models, and QR-deep learning as benchmark models.The backtesting results of all the models were compared against four main stock indices, exploring questions that contribute to our understanding of the accurate estimation of risk measures.

VaR and ES estimation framework
In this study, we extend the decomposition-aggregation strategy, state-of-the-art deep learning methods, and Bayesian optimization technology to the field of risk measures estimation.These technologies are compatible with the properties of financial assets.The corresponding mathematical principles and financial theories are presented below.
The hypothesis of a heterogeneous market Müller et al. (1993) proposed the hypothesis of a heterogeneous market, which states that different participants in a heterogeneous market have different time horizons and dealing frequencies.Therefore, the time horizon of participants has a "fractal" structure, which consists of short-, medium-and long-term components.Each component has its own reaction time to events or news, and different degrees of risk aversion, institutional constraints, and transaction costs.We argue that the risk measure of financial assets is likewise affected by the unique dealing frequency of heterogeneous participants.Therefore, we incorporate the heterogeneous market hypothesis into the design of our estimation framework.We use the decomposition-aggregation strategy ("Decomposition and aggregation based on investor heterogeneity" section) to distinguish participants with different time horizons and dealing frequencies and employ MogRNNs ("Quantile regression Mogrifier RNNs (QRMogRNNs)" section) to generate risk estimation specifically.

Decomposition and aggregation based on investor heterogeneity
Forecasting technology based on decomposition-aggregation learning has been applied in many research fields, such as finance, energy (Wang et al. 2022c;Neshat et al. 2022), and the environment (Kim et al. 2022;Wang et al. 2022d).Research findings in these fields show that forecasts based on decomposition-aggregation learning can improve accuracy.In risk measure estimation, we argue that this mechanism can identify and extract asset sequences at different frequencies (P H and Rishad 2020) for analysis and forecasts.This mechanism is compatible with the heterogeneous market hypothesis that short-term investors affect signals with higher frequencies, whereas the trading behaviors of medium-and long-term investors affect those with lower frequencies.Therefore, this study extends decomposition-aggregation learning to the field of risk measure estimation.Specifically, we propose a real-time decomposition-aggregation approach that uses the available information to learn decomposition and aggregation rules.Whenever new data are known, they are added to the available information set and then decomposed and aggregated according to the rules.

Variational mode decomposition
The decomposition method used in this study is the variational mode decomposition (VMD).VMD (Dragomiretskiy and Zosso 2014) is a powerful signal decomposition algorithm that decomposes a complex signal into several intrinsic mode functions (IMFs) with a specific center frequency and bandwidth that are completely nonrecursive.In the decomposition mode of the VMD, we can redefine the IMF as where A K (t) and ϕ k (t) indicate the instantaneous amplitude and phase of u k (t) , respec- tively; w k (t) is the instantaneous angular frequency.From w k (t) = dϕ k (t) dt > 0 (1) u k (t) = A K (t)cos(ϕ k (t)), (Dragomiretskiy and Zosso 2014), we know that ϕ k (t) must be differentiable at least once and ϕ ′ k (t) > 0. The VMD algorithm is realized by solving the following constrained variable problem: where v k (t) is the input signal, j 2 = −1 , and δ(t) represents the Dirac distribution func- tion.The Lagrange multiplier is used to transform Eq. ( 2) into an unconstrained optimization problem: where (t) is the Lagrange multiplier used to enhance the constraint and α > 0 is the quadratic penalty factor.v(t) is a quadratic penalty term that accelerates the convergence rate and ensures minimum squared error.
According to the ADMM optimization method (Bertsekas and DimitriP 1982; Dragomiretskiy and Zosso 2014), the update modes of u k and w k can be expressed by the fol- lowing equations: where ûn+1 k (w) , x(w) , and ˆ (w) are the Fourier transforms of the signals u n+1 k (t) , x(t) , and (w) , respectively.Moreover, the stop condition of VMD is expressed based on the toler- ance of the convergence criterion: This study chooses the VMD algorithm instead of the empirical mode decomposition (EMD)-class (Huang et al. 1996) algorithm, because VMD can fix the number of generated IMFs, thus avoiding the inconsistency between in-sample and out-sample decomposition results.

IMFs aggregation based on fuzzy entropy
To balance forecasting accuracy and time consumption, we do not model each IMF, but the subseries aggregated by all the IMFs.However, there is no clear division between (2) (5) different frequencies of financial assets (IMFs), and the boundary is fuzzy.Thus, we employ the fuzzy entropy and approximation criterion (Fu et al. 2020) to aggregate all the IMFs.The approximation criterion is defined in Eqs. ( 7) and ( 8): where l denotes the number of decomposed IMFs and FE i |i = 1, ..., l represents the fuzzy entropy value corresponding to the ith IMF.

Real-time decomposition and aggregation mechanism
Let O t be the financial asset data, and M t be the available information set at time t.The real-time decomposition-aggregation mechanism can be implemented as follows: (1) Decompose O 1 , O 2 , . . ., O t−1 at t-1, into k IMFs using VMD and information of M t−1 , such that the jth IMF can be denoted as IMF j |M t−1 .
(2) Calculate the fuzzy entropy of all decomposed IMFs, then acquire the aggregation rule (H |M t−1 ) , and obtain p subseries according to the approximate criterion.
(3) Calculate the jth IMF, that is, IMF t,j |M t when new information is available at t using the VMD and information from M t .(4) Calculate the fuzzy entropy FE t IMF t,j |M t of each IMF at t and apply (H|M t−1 ) to FE t IMF t,j |M t to obtain the aggregated subseries F t,1 , F t,2 , . . ., F t,p , (p ≤ k) at t.
Figure 1 shows the decomposition and aggregation results of the SPX500 index for the training set.As shown in Fig. 1, the frequency of subseries (Sub) 1 is low; some researchers have argued that this sequence is a trend sequence (Zhang et al. 2008).The frequency of Sub 2 is relatively high, which is the medium-and long-term impact brought to the market by particular events or the behavior of medium-and long-term investors.Subs 3 and 4 are high-frequency sequences influenced by short-term shocks (such as monetary policy and the release of US macroeconomic data) or short-term investor behavior.
(7 In the subsequent estimation, we only needed to model the subseries F t,1 , F t,2 , . . ., F t,p , which are aggregated in real time from IMFs and summed up the estimation results to obtain the estimated values of the financial asset data.

Quantile regression Mogrifier RNNs (QRMogRNNs)
Forecasting VaR at significance level α at a future time is equivalent to estimating the τ th quantile.We called Mogrifier LSTM (MogLSTM) and Mogrifier GRU (MogGRU) as MogRNNs collectively.QRMogRNNs are realized by designing the loss functions of MogRNNs as quantile loss, which is consistent with Taylor's study (2000) of QRNN.

Mogrifier LSTM and Mogrifier GRU
We found that the Mogrifier (Mog) structure proved to have better performance owing to an interesting model improvement, that is, performing regular interactive operations on the current input and the previous hidden state.The advantages of adopting MogRNNs to estimate the risk measures of financial assets are reflected in the properties of financial assets and the heterogeneous market hypothesis.

Properties of financial asset
The interaction between the previous hidden layer which stored the historical return information and the current input enables the model to learn the "clustering" of financial asset volatility.By inputting the interaction results into the LSTM or GRU structure, the model retains or discards the historical information to learn the long-term dependencies of the financial sequence.

Heterogeneous market hypothesis
The MogRNNs model architecture coincides well with the heterogeneous market hypothesis, as reflected in two ways.First, the RNNs choose to retain or discard historical information.Information that does not contribute to risk estimation is discarded (such as the impact of short-term speculation on the long-run market), whereas information that contributes to risk estimation (such as medium-and long-term investment behavior) is retained in the hidden layer of the RNNs.Second, the Mog structure can drive retained long-term investment behavior and short-term investment behavior performs interactive operations to simulate the interaction between short-term investors and medium-or long-term investors.
In addition to the abovementioned advantages, the forget gate in the LSTM cell can discard historical information that does not contribute to the estimation, thereby avoiding serious clustered VaR violations.Therefore, we combined QR with MogL-STM and MogGRU to propose two new deep-QR models to model the quantiles of financial assets, namely, QRMogLSTM and QRMogGR.
As LSTM and GRU have similar cell structures, we introduce the structure of MogLSTM to explain the model improvement.A schematic of MogLSTM is presented in Fig. 2. Its mathematical principle is as follows: To demonstrate the modifications of MogLSTM, the cell structure of LSTM is first presented as follows: where σ (•) is the sigmoid function; F , I , J , and O are gates of the LSTM cell; they decide which information will be retained or discarded.The current input to the LSTM x must be related to the previous hidden state h prev .Thus, MogLSTM performs iterative inter- action operations on x and h prev in advance to obtain the modulated inputs x ↑ and h ↑ prev .These inputs,x ↑ and h ↑ prev , can be defined as x i and h i prev respectively, and expressed as follows: with x −1 = x and h 0 prev = h prev .The iterative round (also known as Mog step), r ∈ N , is a hyperparameter; r = 0 recovers standard LSTM.Q i and R i are matrices resulting from random initialization To reduce the number of additional model parameters, the Q i and R i matrices are decomposed into the products of the low-rank matrices: , MogGRU can be modified by the same mechanism, that is, the hidden state h prev and current input x are operated interactively before they are inputted into the GRU cells.

QRMogRNNs
We assume that f (X i , θ) indicates the forecasting model and θ represents all the parameters of this model.Next, we design the loss function in Eq. ( 11), where the parameter θ(τ ) is esti- mated corresponding to τ , so we can estimate VaR using the model f X i , θ (τ ) based on QR theory: The condition quantile of y act i obtained by QRMogRNNs (i.e., QRMogGRU and QRMogLSTM) can be formulated based on the aforementioned forward propagation of MogLSTM and MogGRU as follows: where W(τ ) and b(τ ) indicate the weight matrix and bias corresponding to the quantile τ , respectively; h prev (τ ) represents the hidden state with respect to τ.

Bayesian hyperparameter optimization and model selection
The importance of using Bayesian theory and Bayesian formulas to realize model selection and hyperparameter optimization is reflected in two ways: First, the deep learning model has many hyperparameters that affect the accuracy of the model, especially for MogL-STM and MogGRU structures, which both have an extra parameter, r.Second, according to the "No free lunch" theorem, while model A outperforms model B on one specific task, there must be another specific task on which A performs worse than model B. Therefore, we regard the type of neural network layer as a hyperparameter that has two choices: QRMogLSTM and QRMogGRU.We coded them as 0 and 1, respectively, to realize the model selection.Table 1 presents the hyperparameters to be optimized and their value ranges, and the Bayesian optimization (BO) can be implemented as follows: (11 The basic steps of BO are given as. (1) Build a surrogate probability model of the objective function.
(2) Find the ideal hyperparameters on the surrogate.
(3) Apply these hyperparameters to the true objective function to assess them.
(4) Update the surrogate model incorporating the new results.
After considering the layer type as a hyperparameter and introducing the extra parameter r , we have four alternative models to select: (1) When r = 0 and LT = 0 , the QRLSTM model is selected.
(3) When r > 0 and LT = 0 , the QRMogLSTM model is selected, and the hidden layer state and the current input are operated interactively r times.(4) When r > 0 and LT = 1 , the QRMogGRU model is selected.
Other parameters of our model are set by default: learning rate = 0.01, optimizer = "Adam" and dropout ratio = 0.2.

Estimate ES based on scenario generation
Although ES has replaced VaR as the standard risk measure in the recent Basel Accord, there are few ES estimation methods because they cannot construct a loss function.This study employs the generative model GANs to estimate ES.Inspired by the direct quantile modeling of the well-known CAViaR model proposed by Engle and Manganelli (2004), we employed GANs as a powerful tool to generate asset-tail scenarios.The principle is that the estimated quantile values are inputted into the GANs to generate many future risk scenarios at the tail of the assets, then the ES can be obtained by averaging scenarios lower than the estimated VaR value.

Least squares generative adversarial network
Least squares generative adversarial network (LSGAN) is a variant of the GANs proposed by Mao et al. (2017).This model maintains the core structure of the GAN, which consists of a generator G and discriminator D. The goal of G is to generate new data that D cannot determine as true or fake by learning the underlying distribution of the training-set data.D is a binary classifier that determines whether the data come from the .Because neural networks can fit arbitrary functions, generators and discriminators are generally designed as multilayer neural networks.This is achieved by designing the loss functions L (D) θ (D) , θ (G) and D) , θ (G) for D and G.The main improvement in LSGAN is the modification of the loss functions of G and D to the form of a squared error.In LSGAN, the mathematical expressions for L (D) θ (D) , θ (G) and L (G) θ (D) , θ (G) are as follows: where α , β , and γ are predefined parameters.Minimizing the objective function mini- mizes the Pearsonχ 2 scatter when β − γ = 1 and β − α = 2 are satisfied such that α = −1 , β = 1 , and γ = 0 .For D, a smaller L (D) θ (D) , θ (G) indicates that it is more capable of distinguishing true from fake.For G, a smaller L (G) θ (D) , θ (G) indicates that it is more capable of falsifying.
Training GANs is difficult in practice because of the instability of GANs learning.The discriminator of the original GAN uses a sigmoid cross-entropy loss function, which may lead to the problem of gradient vanishing during the learning process.The LSGAN uses the least square loss function (LSF) as its discriminator.The idea is simple and efficient: the LSF can move fake samples to the decision boundary because it punishes samples that lie on the correct side of the decision boundary for a long time.Based on this characteristic, the LSGAN is comparable to the Wasserstein generative adversarial network (WGAN) in generating samples that are closer to real data; however, the training process and convergence speed of the LSGAN are faster than those of the complex WGAN.
In addition, we considered a two-time-scale update rule training strategy (Heusel et al. 2017) to optimize the learning process of the LSGAN.Specifically, this study employs the "Adam" optimizer with different learning rates for the discriminator network D and the generator G. G uses the low-speed update rule with the learning rate set at 0.0001, and D uses the relatively fast update rule with the learning rate set at 0.0005.
Although the LSGAN ensures that the punishment for the outlier sample is greater, which solves the problem of unstable (insufficient) GANs training, the following ( 14) limitations remain.Excessive punishment of outliers by LSGAN may lead to a decrease in the "diversity" of sample generation, and the generator may have the problem of gradient vanishing when the discriminator is excellent enough.This study provides ideas for GANs to estimate ES and employs the LSGAN to implement the empirical study.In future research, we will explore other GANs variants with better performance to realize ES estimation.

Generating the future risk scenarios using LSGAN
Training GANs can be difficult and there are many possible setups.Therefore, the generator generates simulations Mf using the previous VaR M b and latent vector Z.In practice, the latent vector represents the unknown future events affecting the market index.A schematic of the conditional LSGAN is shown in Fig. 3.The generator is composed of a conditioning network and a simulator network.The conditioning network takes the historical risk trend M b followed by a data normalization (Norm) process and 1D convolutional layers (Conv layers), where convolution is done over the time dimension.This output is then flattened and followed by a fully connected layer (Dense).The conditioning network then takes the latent input Z and concatenates it with the input of the previous layer.This is followed by a fully connected layer (Dense) before it is reshaped and followed by 1D convolutional layers (Conv layers) to shrink it down to the desired output shape.The output of the generator Mf is a vector with simu- lated tail risk for a market index.The discriminator takes a concatenation of real data M b and fake data Mf .This is processed by normalization (Norm) and several 1D convolu- tional layers (Conv layers), flattened, and finally there is a fully connected output layer (Dense).The discriminator gives a critic score, assigning a larger value if it believes the data comes from the true distribution, and a smaller value if it believes it comes from the distribution learned by the generator.

Forecasting ES based on risk scenarios and estimated VaR
As mentioned above, this study used risk scenarios to calculate the ES.Risk scenarios (i.e., quantile scenarios) sample approximate representations of the tail uncertainty relationship of asset indicators (e.g., log returns) on a future date.Suppose we obtain M in observations of the in-sample VaR forecast results provided by any method: VaR(τ ) = VaR 1 (τ ), VaR 2 (τ ), . . ., VaR M in (τ ) .We directly model the estimated VaR(τ ) to generate future risk scenarios that follow a tail distribution.The ES value at the specified confidence level can then be calculated by averaging the scenarios lower than the corresponding VaR.This study forecasts ES based on rolling window exercises, and the model is retrained for every specific period.Assuming that the rolling window size is b, the model receives additional training for every f period according to the previous b periods estimated VaR values, and the epochs of pretraining and additional training are E 1 and E 2 .
The detailed steps for estimating ES using the LSGAN scenario generation approach are as follows: ( Since ES is not elicitable (Gneiting 2011), we chose to first estimate the tail risk of assets (i.e., VaR) and then model the tail risk scenarios based on the historical VaR values to estimate ES.Another potential modeling method is to use GANs directly to generate return scenarios and then calculate the arithmetic average of return scenarios below the VaR value as the ES value.We also verified this method but found the following problems: if the asset returns are modeled directly, the available tail risk scenarios will be relatively few, especially during crisis periods.Furthermore, the tail scenarios of returns may be higher than the VaR values produced by the VaR estimator, which can lead to inaccurate or even ineffective risk forecast models.The method proposed in this study can be flexibly combined with any VaR estimator to jointly forecast ES without the abovementioned problems.

Evaluation methodology
VaR backtesting is typically based on coverage, which measures the percentage of times returns exceed the estimated VaR at a probability level of α (Emenogu et al. 2020).This study implements five commonly used VaR backtesting tests and one loss function: the unconditional coverage (UC) test (also known as Kupiec's POF test) (Kupiec 1995), independence (IND) test, conditional coverage (CC) test (Christoffersen 1998), dynamic quantile (DQ) test (Engle and Manganelli 2004), and Lopez's magnitude loss function (M-Loss) (Lopez 1999).
(1) LR UC where N 1 and N 0 are the number of times the VaR estimate is and is not exceeded, respectively, α indicates the desired level, and ˜ is the Hit Ratio (HR), which can be calculated using where n jk is the number of j values followed by a k value in the sequence (with j, k = [0, 0], [0, 1] or [1, 1] , and 0 indicating that the actual return does not exceed the VaR estimate; otherwise, 1), and ˜ s are defined as (4) DQ test Engle and Manganelli (2004) proposed the DQ test, and the statistical Hit con- sidered in this test is very similar to ˜ with Hit = � − α .Constructing a regression model of this variable with its lagged term as well as the estimated VaR series and other variables that should be considered, the regression form, considering the 4th order lagged term, is expressed as where u t takes the value of −α with probability 1 − α , and 1 − α with probability α.
The matrix representation of Eq. ( 1) is Hit = βHit lag + u .The null hypothesis is constructed as in H 0 : β = 0 .According to the least squares method, . In the case of the null hypothesis H 0 , the DQ test statistic can be expressed as The DQ test statistic asymptotically follows a chi-square distribution with six degrees of freedom.
(5) Lopez's magnitude loss function The magnitude loss (M-Loss) function not only considers the number of losses but also the magnitude of extreme tail events; thus, the M-Loss function is more in line with economic significance.The M-Loss function can be defined as where Y t is the observed return, Ŷ α t is the estimated VaR, and I(•) is an indicator function. ( (20) ( , for all t and ES F α,t > ES α,t for some t; VaR F α,t ≥ VaR α,t , for all t.The other four hypothesis test forms were adopted unaltered from the original literature.

Flow of the proposed model
The primary goal of this study was to devise a compound model for VaR and ES forecasts.QRMogLSTM and QRMogGRU were implemented on PyTorch, whereas LSGAN was built on TensorFlow.The sequence flow of the proposed model is presented in Algorithm 1 and the overall process is as follows: Step 1: Divide the stock market index datasets into training, validation, and testing sets.The training and validation sets are used as in-sample data, whereas the testing set is used as out-of-sample data.A more detailed division of the data is presented in "Data" section.
Step 2: Using the subseries generation mechanism introduced in "The hypothesis of a heterogeneous market" section, learn the decomposition and aggregation rules on the training set and generate subseries according to the rules for every time data outside the training set are obtained.
Step 3: Estimate the quantiles of the subseries using the QRMogRNNs based on the Bayesian hyperparameter optimization and model selection in "Quantile regression Mogrifier RNNs (QRMogRNNs)" section.After that, sum up the subseries estimation values to obtain the final VaR estimation results for the stock market indices.
Step 4: Model the estimated quantiles directly using the LSGAN such that the generator and discriminator are trained by a zero-sum game, and the ES is estimated according to the rolling estimation mechanism.
Step 5: Backtest the VaR and ES estimation results, including the five VaR tests, risk scenario generation evaluation, six ES tests, and two joint scoring functions.
It should be noted that since ES can only be jointly elicitable with VaR, the ES estimation method in this study also relies on VaR estimation; thus, there is an inherent inconvenience in the training of the model.Other ES estimation studies have encountered the same challenge.For example, Taylor (2019) estimated VaR values based on the CAViaRtype model and then used the VaR values within a rolling window to estimate the parameters of the formula capable of deriving ES values.This method also requires obtaining the next-period VaR estimated value VaR t+1 and then using a rolling window to select the VaR values as the training data to estimate the parameters of the jointly elicitable formula to obtain the next-period ES estimated value.In this study, the rolling window method was used to train the LSGAN and forecast the VaR.The VaR t+1 is estimated first, and then the rolling window is employed to select the VaR sequence to be inputted into the LSGAN as the training data.The ES value in the next period is estimated based on VaR t+1 and the risk scenarios generated by the trained LSGAN.

Data
We took a sample of 1,683 weekly log-return from RESSET for the FTSE100, N225, SPX500, and DAX stock market indexes.The samples ranged from January 5, 1990, to April 1, 2022.In the VaR estimation, we used the first 1,262 data points as in-sample data to learn the decomposition-aggregation rules, train QRMogLSTM and QRMog-GRU, and divide the validation set for Bayesian hyperparameter optimization and model selection.The final 421 data points were used as out-of-sample data for the backtesting procedure.In the ES estimation, we used the estimated 1,262 in-sample VaR values to pre-train LSGAN and the last 421 data points as out-of-sample data for the backtesting procedure.In this study, the pretraining epoch E 1 was 1000, the rolling window size was b = 100 , and the model was extra-trained with additional E 2 = 500 epochs for every f = 10 period.Figure 4a-d show the division of the dataset and the purpose of each part schematically from four aspects.The experiment was implemented on a personal computer with an AMD Ryzen 5 5600H six-core processor, with Radeon Graphics 3.30 GHz, 16 GB RAM, and a single NVIDIA GeForce GTX 1650 GPU.
Table 2 presents the summary statistics, and the results confirm the prevalence of financial asset characteristics such as high kurtosis and fat tails.In addition, the logreturn of all indices has negative skewness; the null hypothesis of the normal distribution of the Jarque-Bera test is rejected, and the null hypothesis of the existence of the unit root of the Augmented Dickey-Fuller test is also rejected.These results drove us to employ QR combined with deep learning to predict the quantiles and LSGAN to capture the tail uncertainty information of the assets.

Out-of-sample VaR forecasting
In this section, the proposed model is compared with key benchmarks.The statistics in "Evaluation methodology" section are used to backtest the VaR estimation results at two probability levels, namely, τ = 0.05 and τ = 0.01 .This study includes 14 benchmarks: (1) Historical Simulation (Hist); (2) Normal distribution method (Normal); (3) CAViaR series models: CAViaR-SAV, CAViaR-AS, CAViaR-IGARCH,   9) QR-LSTM and QRGRU.The CAViaR-type models and QR-SVM were implemented in R, and the other models were implemented in Python.
Before the backtesting procedure, we analyzed the results of the hyperparameter determination and model selection, and the results are presented in Table 3.
(1) From the perspective of model selection, r is always greater than 0, which indicates that QRMogLSTM or QRMogGRU is superior to the naive QRLSTM or QRGRU; On the other hand, the optimal models are different on a specific subset of a specific dataset, which is consistent with the "No free lunch" theorem, which shows the importance of selecting models based on different datasets.
The data frequency in this study was weekly, between daily and monthly, and the data volume was relatively moderate.Therefore, the number of times the LSTM was selected was slightly higher than that of the GRU.Suppose that we model the data at a higher frequency (daily).In this case, the dataset would be larger, the LSTM with more parameters would be fully optimized, and the performance of the LSTM would be expected to improve.In contrast, because the GRU has fewer parameters to train, if we model lower-frequency (monthly) data, the learning process of the GRU converges more easily than that of the LSTM, and its performance may be better.
(2) There are many cases where the number of optimal model layers is three or four, and the number of hidden layer units is more than 10, which shows that QR combined with a complicated deep neural network can improve the prediction performance.
(3) From S1 to S4, the number of optimal feature inputs to the proposed model decreased.S1 is a low-frequency sequence and the proposed model requires additional lag features.As a high-frequency sequence, S4 always takes one as its optimal    feature, indicating that it is more susceptible to short-term impacts.Furthermore, when τ = 0.01 is used, the optimal feature number is always between one and two, which means that more attention should be paid to the impact of short-term shocks when measuring risk with a higher confidence level.The next section analyzes the performance of the proposed model and key benchmarks in the real market; the evaluation results of the out-of-sample VaR forecasts of the four tests are presented in Tables 4, 5, 6 and 7  FTSE100 index are presented in Table 4, the N225 index in Table 5, the SPX500 index in Table 6 and the DAX index in Table 7.
The first objective was to compare our model with competing benchmarks using five VaR evaluation measures.To facilitate a comparison of the backtesting results of the different models, we report the number of test rejections and average M-Loss values in Table 8.
Looking at the evaluation results for all four indices at both quantile levels, we find that the proposed model is successfully backtested at the 1% significance level and 5% confidence level, except for the SPX500 index at the τ = 0.01 level.Our model obtained the minimum value of the average M-Loss, meaning that the magnitude of extreme events

Table 5 Out-of-sample VaR forecast evaluation of N225
In brackets are the p values corresponding to the statistics.HR = N 1 /(N 1 + N 0 ), N 1 and N 0 are respectively the numbers of times that the VaR estimate is and is not exceeded.Bold values mean that the null hypothesis is not rejected beyond the forecast was small.Other benchmarks have significantly fewer successful backtesting scenarios, mainly because they cannot pass the DQ test at τ = 0.01 , and there are cases of risk overestimation or underestimation.The advantage of our model is also reflected in the VaR backtesting results at τ = 0.01 , which can be used to assess risk more accurately.Therefore, our model does not significantly underestimate extreme losses.In addition, it is worth noting that the CAViaR-AS model also outperforms other CAViaR models in terms of forecast performance, which is consistent with the findings of Merlo et al. (2021).
The above analysis, based on typical tests, shows that the proposed model can forecast risk accurately in more scenarios than the other 14 benchmarks.We also found Table 6 Out-of-sample VaR forecast evaluation of SPX500 In brackets are the p values corresponding to the statistics.HR = N 1 /(N 1 + N 0 ), N 1 and N 0 are respectively the numbers of times that the VaR estimate is and is not exceeded.Bold values mean that the null hypothesis is not rejected evidence of this in the sequence diagrams.To obtain more explicit images of how different models differ in their risk forecasts, we plot the series for only the three models that performed better in the above tests: CAViaR-AS, QRNN, and our proposed model.The circles in the plot represent the actual returns, with yellow and green representing positive and negative returns, respectively; the red circles represent the actual returns, where the losses exceed the forecasts of the proposed model.The analysis in Fig. 5 reveals that the risk forecast curve of CAViaR-AS is smoother, whereas those of the proposed model and QRNN are volatile.Moreover, the difference in performance between the models is not significant during periods of stable stock prices, whereas it is more significant in the aftermath of major financial crises such as the Chinese stock market crash, Brexit, and the breakout of COVID-19.In March 2020 (enlarged area in the figure), the global stock market showed violent fluctuations owing to the COVID-19 pandemic, and stock prices fell sharply.The excess red circles also appear more intensively during this period, which indicates that the forecasting model does not accurately forecast sudden risks.Nevertheless, the difference between the actual and forecasted values of the proposed model was not significant, whereas the other models were more likely to overestimate the risk at τ = 0.01 and underesti- mate it at τ = 0.05 , suggesting that our model is still significantly more accurate.
In addition to estimation accuracy, we also need to consider the time required to generate a VaR forecasting result, as our forecasting framework includes time-consuming tasks such as hyperparameter optimization, model selection, and deep learning training.Among these, BO is the most time-consuming process.We performed  a BO process with 50 iterations for each of the four subseries, each subseries took 196 s, and the time consumption for calculating the four components was 784 s.Nonetheless, BO determines the structure of our VaR estimation model and the corresponding parameter values over a period; therefore, we do not need to perform BO for every forecast.After the BO, the training time of our model was relatively short.It only took 1.84 s to obtain the VaR value of a subseries; therefore, a total of 7.36 s were required to obtain a forecasted VaR value.This time consumption is relatively short; therefore, our model can also be used for daily VaR estimations.Finally, ablation studies were performed to evaluate the effects of the decomposition and aggregation technique (DA), BO, and Mogrifier structure (Mog) on the proposed model.We report the number of test rejections and average M-Loss values for the four indices.The results are summarized in Table 9.It can be observed that DA, BO, and Mog significantly improve the proposed model.In particular, DA preprocessing techniques and the Mog structure are more important than the BO.The proposed model, without hyperparameter optimization and model selection, can also achieve better riskforecasting results than the CAViaR-type models and QRNN.

Out-of-sample ES forecasting
This section evaluates the out-of-sample ES forecasting results.The risk scenarios generated by the LSGAN are assessed in "Evaluation of risk scenario generation results" section.Furthermore, the out-of-sample ES forecasting results are evaluated using statistical tests and scoring functions, as presented in "Backtesting ES with statistical tests and joint scoring functions" section.

Evaluation of risk scenario generation results
After pretraining, the LSGAN performs additional training based on the latest 100 VaR historical data at the 95% probability level and generates 500 quantile scenarios for outof-sample ES estimation.The model architecture and parameter settings of the LSGAN designed in this study are listed in Table 10.
The evaluation of scenario generation techniques commonly encompasses three primary categories (Li et al. 2020): (1) output-based evaluation, which leverages error metrics such as the mean squared error to gauge performance; (2) distribution-based evaluation, entailing assessment via computations of the energy score; and (3) eventbased evaluation, encompassing metrics such as coverage rate and correlation coefficients.We carefully selected the coverage rate (CR) and correlation coefficients (CC) of According to previous studies (Ma et al. 2016;Garatti et al. 2019), we can also use correlation coefficients to measure the similarity in the dynamics between VaR and the generated risk scenarios.We compared the LSGAN model with three commonly used scenario generation methods: the Gaussian distribution (GD), kernel density estimation (KDE), and Markov chain Monte Carlo (MCMC) methods.The evaluation results of the risk scenarios generated using the four methods are listed in Table 11.
Based on the evaluation results presented in Table 11, it is evident that the risk scenarios produced by the LSGAN exhibit a higher probability of encompassing the VaR values and demonstrate a stronger correlation with the VaR dynamics at both significance levels.This observation suggests that utilizing risk scenarios generated by the LSGAN for calculating ES is in line with market risk theory and yields more accurate results than other benchmarks.
Furthermore, as depicted in Fig. 6, the risk scenarios for the four indices were generated using the LSGAN.The 95% probability-level VaR forecasts from QRMogRNNs are denoted by a dashed black line.It can be seen that LSGAN achieves accurate sampling of the tail return distribution beyond VaR, and ensures the diversity and dynamism of the generated risk scenarios.The probability density curves in Fig. 7 depict the distribution of the in-sample 95% VaR data versus the 500 scenarios generated by the pretrained LSGAN.By observing the distribution fit, it was found that the LSGAN could learn the characteristics of historical data well and generate reliable synthetic data.The goodnessof-fit between the probability density function (PDF) of the risk scenarios generated by the LSGAN and the PDF of the in-sample VaR values was 0.8882 for FTSE, 0.9298 for N225, 0.8874 for SPX, and 0.9018 for DAX.

Backtesting ES with statistical tests and joint scoring functions
In this section, we verify the performance of the model in terms of ES forecasts, which is a backtesting procedure with six statistical tests and two joint scoring functions.This proved to be effective if the proposed model was successfully tested in more situations.Table 12 presents the p values of the resulting test statistics; the null hypotheses are not rejected in all situations for all desired levels.This shows that the ES estimation approach For comparison with ES forecasts produced by other commonly used scenario generation methods, Table 13 reports the values of the joint scoring functions S FZN (Nolde and Ziegel 2017) and S FZ0 (Patton et al. 2019) averaged over the out-of-sample period.A lower scoring function value represents a more accurate ES estimation.The ES Fig. 6 Risk scenarios generated by LSGAN for the four stock market indexes.Gray bands correspond to Chinese stock market crash (2015, 07-2016, 09), 2018recession (2018, 01-2018, 06), Brexit (2018, 08-2019, 03), and COVID-19 outbreak (2020, 02-2020, 03) Fig. 7 The distribution of the in-sample 95% VaR data versus 500 scenarios generated by the pre-trained LSGAN estimation process based on the scenario generation of the GD, KDE, and MCMC was the same as that of the LSGAN-based method.The evaluation results of the two joint scoring functions in Table 13 show that the ES estimation results based on the LSGAN scenario generation are more accurate than those based on other scenario generation methods (i.e., GD, KDE, and MCMC).
To offer graphical intuition to support the results, Fig. 8 presents the results of ES outof-sample estimation graphically.By analyzing Fig. 8, we find that the estimated ES is relatively stable during calm periods and larger during periods of turbulent markets, with a better ability to capture extreme risks.The high volatility of ES is obvious during recessions and major economic and financial crises, such as the Chinese stock market crash in 2016, Brexit in 2018, and the COVID-19 outbreak in 2020.Moreover, during the COVID-19 outbreak, the ES values estimated using the proposed model were greater than those after other major adverse events.This shows that the proposed model can  accurately estimate the ES and provide efficient guidance to financial institutions and investors regarding asset allocation.
The training and forecasting of our ES estimation framework do not consume much time.Before out-of-sample forecasting, the LSGAN requires a pretraining process that takes 150 s.In out-of-sample forecasting, we conducted extra training on the LSGAN every ten periods, and the time required to obtain the ES values of the ten periods was 510 s.Therefore, our estimation model can also be used to forecast daily ES.

Summary and conclusions
Finance and economics scholars have long explored better methods for estimating VaR and ES.The structure of the proposed models is often related to the properties of the financial assets.This study follows this research direction and proposes an estimation framework that combines decomposition-aggregation learning with MogRNNs.The estimation framework is consistent with market heterogeneity theory and the properties of asset volatility.MogLSTM and MogGRU can better capture the "long memory" and "clustering" of financial assets by their particular cell structure and the interactive operation between the previous hidden state and the current input.However, these two models have not been extended to predict financial uncertainty.This study proposes combining the above two models with QR to estimate VaR and adds BO to improve practicability.The backtesting results indicate that the model produces reliable VaR estimates.Furthermore, to implement ES estimation, this study proposes a new estimation method using LSGAN to model the distribution of quantiles and generate possible future scenarios of downside risk at a specific probability level.This study also provides a compromise between direct and decomposition forecasts, which balances forecasting accuracy and computational burden.Taking four crucial stock market indices as research objects, five VaR backtesting tests, six ES backtesting tests, and two scoring functions show that the proposed model can forecast risks more accurately than 14 popular benchmarks.We conclude that the proposed model is a promising modeling framework for forecasting risk and is worthy of further study to expand its application.(2015, 07-2016, 09), 2018recession (2018, 01-2018, 06), Brexit (2018, 08-2019, 03), and COVID-19 outbreak (2020, 02-2020, 03)

Fig. 1
Fig. 1 Decomposition and aggregation results of SPX500

Fig. 2
Fig. 2 Schematic of MogLSTM with r = 5 (13) P � UN |� = P �|� UN P � UN P(�) , training set or G by classifying the input data as true or false.During the iterative process, G and D alternately update their weights to improve their discriminative or generative power and generate an adversarial model to reach the Nash equilibrium when D cannot discriminate between true and false.The input to the generator is a latent variable Z, which can be a simple random distribution such as a Gaussian distribution.The goal of G is to map Z to G(Z) such that it approximates the true distribution as closely as possible.The inputs to D are the fake data generated by G and the training set train 1 Therefore, we referred to the recent literature(Zhu et al. 2021;Fatouros et al. 2022) on modeling VaR using GANs to construct the structure and parameter settings of the LSGAN in this study.We implemented a conditional LSGAN in which the first step was a pretraining process.Generator G first estimates the in-sample VaR values to learn the future risk scenarios according to the critic scores given by discriminator D. The second step involves an extra training and forecasting process using the rolling window mechanism.Generator G takes b periods of the previous VaR values for additional training to generate the tail distribution of the following future f periods.The data are represented by M containing a total of W periods of VaR, split into two parts: the M b containing VaR for the b previous periods, and the M f that contains the VaR for the f following periods.

Fig. 3
Fig. 3 Schematic of LSGAN ) = VaR 1 (τ ), VaR 2 (τ ), . . ., VaR M in (τ ) into the LSGAN model for pre- training.The epoch of LSGAN is E 1 = 1000 .The generator and discriminator game against each other, and the pretrained LSGAN LSGAN trained = LSGAN VaR(τ ) can generate future risk scenarios when they reach the Nash equilibrium.(2)Let the trained LSGAN generate N risk scenarios:S G = S 1 G , S 2 G , . . ., S N G = LSGAN trained (Z), where Z is the latent vector.The generated risk scenarios con- tain the uncertain information at the tail of the asset.(3) Forecast the out-of-sample VaR value, VaR QRMogRNNs , at a given confidence level (95% in this study) for time t from the QRMogRNNs and select scenarios with S G | S G < VaR QRMogRNNs lower than VaR QRMogRNNs .(4)Average the selected scenarios to obtain the future f periods ES values:ES = E S G | S G < VaR QRMogRNNs .(5)Train the LSGAN every f period according to the previous b periods VaR and set the epoch as E 2 = 500.(6) Repeat (2-5) steps to estimate all the out-of-sample ES values.

Fig. 4
Fig. 4 Schematic of dataset division . The backtesting results for the

Table 1
Value range of hyperparameter to be optimized BO is based on Bayes' theorem, described as where UN indicates the unknown information, P UN refers to the prior distribution, P | UN indicates the probability, and P UN | denotes the posterior distribution.

Table 2
Summary statistics of the weekly log-return of the four indexes

Table 3
Hyperparameter values determined by Bayesian optimizationIn this paper, the model selection is realized by different values of LT and r .S refers to sub-series, which is aggregated by decomposed IMFs according to fuzzy entropy and approximation criterion.

Table 4
Out-of-sample VaR forecast evaluation of FTSE100In brackets are the p values corresponding to the statistics.HR = N 1 /(N 1 + N 0 ), N 1 and N 0 are respectively the numbers of times that the VaR estimate is and is not exceeded.Bold values mean that the null hypothesis is not rejected

Table 7
Out-of-sample VaR forecast evaluation of DAXIn brackets are the p values corresponding to the statistics.HR = N 1 /(N 1 + N 0 ), N 1 and N 0 are respectively the numbers of times that the VaR estimate is and is not exceeded.Bold values mean that the null hypothesis is not rejected

Table 8
Summary of the backtesting results for each probability level for the stock indices.Number of test rejections for the four indexes

Table 9
(Wang et al. 2017cktesting results for ablation studies -(DA), -(BO) and -(Mog) denote the removal of DA, BO and Mog from the proposed framework, respectively This is because using other evaluation methods requires obtaining actual observations, which do not exist for risk measures (i.e., VaR and ES).To calculate the ES value at a certain significance level based on the risk scenarios, it is necessary to ensure that the ES value is lower than the corresponding VaR value at that significance level.Therefore, the CR defined in this study represents the probability that the VaR values are present within a set of generated scenarios.A higher CR value indicates that the generated scenarios are more likely to represent tail risk, indicating that the generated scenarios are more reliable.The formula for the CR(Wang et al. 2017) is where C(τ ) represents the CR with respect to VaR t (τ ) and I(•) represents an indicator function.P s,t represents the s-th scenario value at t time and S represents the number of scenarios.

Table 10
List of LSGAN parameter

Table 11
Comparision of models using the coverage rate and correlation coefficients on LSGAN scenario generation is effective and provides a new tool for estimating the ES in tail risk management. based

Table 12
Out-of-sample ES forecast evaluation by using statistical testsThe bold result indicates that the null hypothesis is not rejected.The hypothesis test form of ES UC is:H 0 : P [α] t = F [α] t , ∀t ; H 1 : ES F α,t ≥ ES α,t, for all t and > for some t, VaR F α,t = VaR α,t , for all t.The hypothesis test form of ES C is: H 0 :P [α] t = F [α] t , ∀t ; H 1 : ES F α,t ≥ ES α,t, for all t and > for some t, VaR F α,t ≥ VaR α,t , for all t.

Table 13
Out-of-sample ES forecast evaluation using joint scoring functions