An evaluation of the adequacy of Lévy and extreme value tail risk estimates

This study investigates the simplicity and adequacy of tail-based risk measures—value-at-risk (VaR) and expected shortfall (ES)—when applied to tail targeting of the extreme value (EV) model. We implement Lévy–VaR and ES risk measures as full density-based alternatives to the generalized Pareto VaR and the generalized Pareto ES of the tail-tar-geting EV model. Using data on futures contracts of S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 during the Global Financial Crisis of 2007–2008, we find that the simplicity of tail-based risk management with a tail-targeting EV model is more attractive. However, the performance of EV risk estimates is not necessarily superior to that of full density-based relatively complex Lévy risk estimates, which may not always give us more robust VaR and ES results, making the model inadequate from a practical perspective. There is randomness in the estimation performances under both approaches for different data ranges and coverage levels. Such mixed results imply that banks, financial institutions, and policymakers should find a way to compromise or trade-off between “simplicity” and user-defined “adequacy”.


Introduction
Value-at-risk (VaR) is an intuitively simple tail-based risk measure that is popular among practitioners and academics.Recent applications of VaR were highlighted by Perignon and Smith (2010a), Frésard et al. (2011), and Perignon and Smith (2010b).However, the VaR measure has some limitations.It fails to satisfy the requirement of sub-additivity, implying that it does not fulfill the requirement of coherence.Further, VaR fixes the tail events corresponding to a specific confidence level.Although it considers the likelihood of conditional tail events, it ignores the magnitude of the catastrophe after the occurrence of a tail event.In a nutshell, VaR provides a snapshot of unsystematic losses while failing to consider the actual size of unsystematic losses that exceed the cut-off points.To offset such limitations and ensure that the coherence (sub-additivity) requirements are met, the expected shortfall (ES) measure has been introduced.ES estimates the unsystematic loss by weighing all the possible losses in the tail of the distribution, thus circumventing the limitation of VaR.
Many studies (e.g., Longin 1996;McNeil and Frey 2000;Jondeau and Rockinger 2003;Gençay and Selçuk 2004;Tolikas and Gettinby 2009;So and Wong 2012;Cheng et al. 2015;Du and Escanciano 2017;Bayer and Dimitriadis 2022;Otto and Breitung 2022) have used simple to adequately complex methodologies and assumed different distributional properties in data-generating process for estimating and backtesting VaR and ES models.Some recent studies have also identified VaR forecasting breakdown due to structural change and a break in the data-generating process of returns (Chavez-Demoulin et al. 2014;Quintos et al. 2001).Through joint modeling, time-varying conditional VaR, and ES, Taylor (2019) produced forecasts with generalized autoregressive conditional heteroskedasticity (GARCH) (1,1) and Glosten-Jagannathan-Runkle GARCH (1,1) models using maximum likelihood based on a Student t-distribution, and the asymmetric Laplace likelihood was used to evaluate post-sample VaR and ES forecasts.Patton et al. (2019) also jointly modeled VaR and ES in a new dynamic framework, which is semiparametric and agnostic about the conditional distribution of returns, and confirmed via simulation that the proposed new ES-VaR models outperform forecasts based on GARCH or rolling window models.The forecasting process of VaR and ES requires sophisticated and complicated models.Lazar and Zhang (2019) examined whether the inadequacy of modeling leads to the model risk of such risk measures and found that ES estimates using GARCH models require more minor corrections for model risk than VaR.
However, there is some trade-off between simplicity and adequacy when deciding on models and the underlying data-generating processes.Under the Basel framework, the Bank for International Settlements (BIS 2019) requires banks to establish "an adequate system for monitoring and reporting risk exposures" to assess risk profiles.With this "adequacy" in mind, Hoga and Demetrescu (2022) developed a sequential procedure that can directly and continuously determine risk assessments based on VaR and ES forecasts with controlled size based on the t-GARCH model.Kourouma et al. (2010) found an underestimation of the risk of loss for unconditional VaR models (historical and extreme values (EVs) theory VaR model) compared with conditional models.The conditional EVT model is more accurate for predicting risk losses during the 2008 Global Financial Crisis.Despite their accuracy, banks are reluctant to use conditional EVT models as the Basel II agreement penalizes banks for using such models.
Therefore, a relevant question arises as to why banks or financial institutions are reluctant to use some models that are adequate to satisfy the BIS framework.Why is the simplicity of tail-targeting EV models, which are much easier to implement, not attractive and adequate compared with models chosen by banks and financial institutions, or do they define "adequacy" from a different perspective?For our study, the heuristic adequacy of the measure is the naive "closest to the empirical estimate." The EV models are based on the distribution of extreme returns instead of all returns (Bali 2007).The simplicity of the tail, characterized by the EV model, leads to the analytic formulation of tail-focused risk measures of VaR and the tail-aggregate risk measure of ES.
On the other hand, the Lévy heavy-tailed generalized hyperbolic (GH) family models are based on full-density distributions and are not easy to implement.We consider the Lévy family models and purely extreme tail-based extreme value (EV) models from an adequacy perspective to quantify investors' risks.It is the adequacy perspective that should guide us in the choice of models in risk measures but not the simplicity.However, if simplicity is advocated, then how should the adequacy of models be determined?Thus, whether the performance of EV risk estimates is superior to that of Lévy risk estimates is an empirical question.
In this study, we conduct a heuristic study to determine the adequacy of VaR and ES estimates of investment risks in leading indices during a period when markets were falling and recovering from the global financial or subprime crisis (2007)(2008).The standard VaR measures provide inaccurate estimates of losses during highly volatile periods as they require an explicit functional form (normal or lognormal) on the distribution.EV models have seen many applications in modeling extremities of weather, reserves, and financial extremities (Pidgeon 2012;Monier and Gao 2015;Cheng et al. 2014).However, unlike the spectral risk measure, VaR and ES are purely tail-based risk measures.As EV is also purely a tail-based method, it might cause an uncanny impulse that VaR and ES with the EV method might be a good alternative.
Lévy-based heavy-tailed models belong to another class of models that has also been investigated in modeling extreme fluctuations.Lévy models use full data to estimate the parameters.In contrast, an EV model uses only the partial data remaining in the extreme tail of a distribution concerning a certain cut-off point.
EV models use only a few extremely large returns on the tail in calibration when one believes that the extreme tail data follow a generalized EV distribution.This presumption makes us not worry about the true distribution of returns smaller than the threshold that are not in the tail.This idea may be sufficient to get reasonable numbers for the risk measures of VaR and ES defined on the tail.Such an approach applies fewer data points than models that calibrate historical data.It is accepted that the relevance of systematic fluctuations (small return values) not on the tail can be ignored when modeling.Nevertheless, we do not possess an axiomatic justification to assume this will necessarily be true.
This prompts us to investigate the risk of investment in world markets falling and recovering during the Global Financial Crisis of 2007-2008.We adopt a range of models with moderate time-varying volatility and incorporate a stochastic diffusive perturbation of the markets with time as our benchmark.We consider models of the GH family that include stochastic volatility through stochastic time changing without an explicit dynamic for volatility.We assess the comparative performance of tail modeling using both Lévy (both systematics and unexpected returns) and EV models (only unexpected returns), followed by a comparison of the performances of the respective tail-based measures of risk VaR and ES based on both approaches.Thus, we follow a procedure of fixing the tail as applied in standard EV calibration that only uses unexpected returns.
We then obtain Lévy-tails with calibrations that use both systematics and unexpected returns.
The mathematically elegant Lévy approach has a significant limitation, i.e., except for a few trivial cases, there is no analytic formula for the risk measure VaR, without even mentioning the case of ES.Therefore, the VaR calculation is relatively difficult to implement.Complicacies and huge time requirements in implementation have deterred practitioners from using Lévy models to forecast VaR and ES, as observed in a VaR backtesting study.To the best of our knowledge, this is the first study to compare the performance of tail-based VaR and ES estimated for the EV and Lévy models by contrasting the adequacy of tail-based risk measures while considering simplicity in estimating with this tail-targeting model.
The superior method is unclear at the outset.The presumably advantageous use of the returns in the EV model might not be advantageous in practice when the evaluator is obliged to consider small return values that influence the fits, which must determine the shape of the tail.Moreover, there are concerns about whether applying extreme return observations on the tail can be sufficient to model extreme fluctuations, even when market movements are not extreme (such as the decade following the 2007-2008 Global Financial Crisis).In this study, we seek to shed empirical light by estimating the tailbased VaR and ES following both approaches in the existing theoretical framework using data from major indices.The sample period is when markets suffered and recovered from the Global Financial Crisis.
The contribution of this study is that we assess the relative importance of the adequacy and simplicity of EV and Lévy models in estimating VaR and ES.We try to answer the following questions: "Is the simplicity of EV models adequate to guarantee that they would perform robustly in describing extremely unexpected return phenomena?""Should the adequacy of the Lévy models be more important in the backdrop of the simplicity of EV models?"We find that the simplicity of the tail-characterized EV model leads to the analytic formulation of the tail-focused risk measures of VaR and the tailaggregate risk measure of ES.However, the performance of EV risk estimates is not necessarily superior to that of Lévy risk estimates.On the other hand, VaR estimates based on Lévy models are more stable than those based on the EV model.However, it is not possible to establish an indisputable rule for a particular Lévy model.The performance of models with only a few unexpected extreme returns (EV model) and modeling with both numerous smaller expected and few extreme unexpected returns (Lévy models) are mixed.
Our model testing with heuristic adequacy measure reinforces the theoretical fundamental that only extreme observations of the EV model (discarding smaller systematic returns) and smaller and extreme observations of Lévy models are different approaches with the common goal of a meaningful simplification of reality.Their relative performance for a particular time window may fail to constitute any guarantee.There is some randomness of estimation performances under both approaches.The explanatory power of the approaches is hardly distinguishable across our data.
In this study, we determine whether a model is solely tail-targeting or has little effect on VaR and ES forecasts at least for the most crucial periods, comprising both sharp market downturns and smooth recoveries of the 2007-08 Global Financial Crisis.
Here, the choice of model can be based on a compromise or trade-off between simplicity and user-defined adequacy.As many banks and financial institutions do not follow adequacy requirements in risk measures as stipulated in the Basel agreement, our findings shed empirical light on such complexity.Our results imply that when the results are mixed, banks and financial institutions should find a way to compromise simplicity and adequacy.
The remaining parts of the paper are structured as follows.Sections "Characterization in Lévy framework" and "Initial data analysis" discuss the characterization of a Lévy framework and initial data analysis.Sections "Estimation of risk measures" and "GOF: EV versus Lévy" discuss the estimation of VaR and ES under Lévy, its contender EV approaches, and the goodness of fits (GOFs) under contending approaches.Section "Comparison of Risk Measures" discusses the Comparison of Risk Measures.Section "VaR and ES Backtesting" compares the forecasts of VaR and ES for the approaches.Section "Discussion" provides a discussion, and we conclude in Sect."Conclusion".

Characterization in Lévy framework
Lévy models have recently been applied in modeling extreme behavior analysis (German 2002;Fajardo 2015;Fajardo andMordecki 2006, 2014;Kim et al. 2008;Fuse and Meucci 2008;Wong andGuan 2011, De Oliver et al. 2018;Farkas and Mathys 2022).The characteristic function of a stochastically continuous process that starts at zero and possesses stationary independent increments has the following general form (see Sato (1999) and Schouten ( 2003)): for s ∈ ℜ, t ≥ 0 and constants a ∈ ℜ, b ∈ ℜ + , where ν is the so-called Lévy measure defined on ℜ\{0} that satisfies square inerrability of tiny (< 1) jumps: Equation ( 1) is the so-called Lévy-Khinchine representation of a Lévy process,2 which is closely aligned with the concept of infinitely divisible distribution: Thus, the inverse Fourier transform can be applied to obtain the numerical transition density from the characteristic function (1) with the Lévy measure ν of a particular Lévy process, which always exists.The numerical transition densities can then estimate the risk measure VaR under different model assumptions.However, in this study, our interest is mainly on the primitive members of Lévy processes belonging to the GH class, (1) which have been widely used in financial modeling (Barndorff-Nielson 1977, 1978, 1995;Eberlein and Prause 1998;Prause 1999;Eberlien and Keller 1995;Bingham and Kiesel 2001;Eberlien and Hammerstein 2002) due to the availability of closed-form densities.
We focus on the GH subclass 3 of Lévy processes (variance gamma (VG), normalinverse Gaussian (NIG), hyperbolic distribution (HYP), and GH) and estimate the risk measures-VaR and ES-to investigate the relative adequacy of a purely tail-based simple analytic EV risk model compared with full density-based Lévy risk models.Among others, these measures have been investigated by Cotter and Dowd (2006) in the context of future contracts and by Sorwar and Dowd (2010) in the context of options contracts.
Let X 1 = log S t+1 S t for nonnegative integer t and is characterized by Eq. ( 1) (the Lévy-Kintchine Formula).For the models we consider, the equivalent processes are given more effectively by their densities (see Schouten 2003): where K I is the modified Bessel function of the third type with index I; θ is the skewness parameter; and v is the percentage excess kurtosis in the distribution for the VG model.
Due to these closed-form densities, obtaining standard errors (SEs) of each parameter becomes easier by easily computing Fisher's information matrix.
In our context, the competing idea to the Lévy approach assumes that only the extreme returns characterize the performance of the risk measure of VaR and ES.As in the studies by Dowd (2005), Cotter and Dowd (2006), andMozumder et al. (2017), perhaps the most elegant tool to utilize in such a context is the peaks-over-threshold, which lies in the fact that as threshold u becomes larger, the distribution of exceedances converges to generalized Pareto (GP) distribution, having the following two-parametric characterization: (4) 3 VG, NIG, and HYP are versions of a GH model with some or other parameters restricted.Thus, to observe the effect of full flexibility and selected restrictions for this family of processes, in addition to the restricted models, we include the GH model.
where ξ and β > 0 are shape and scale parameters, respectively, contingent on each choice of threshold u.

Initial data analysis
We employ future contracts return data; our empirical analysis is based on the returns of the S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 indices.We choose futures contracts because there is a lack of studies on EV and Levy that employ futures contracts data.The data are about futures contracts from January 1, 2007 to December 31, 2017, which expire in the following trading months.The rollover from one expiring contract to the next occurs at the start of each trading month.Datastream considers padding the dataset and considers bank holidays' end-of-day price as the previous trading day's end-of-day price-a technique accepted by practitioners and ensures we have the same daily returns for all indices (2762).Our sample period comprises the period of the 2008 Global Financial Crisis and beyond.This helps us to check the robustness of the adequacy versus simplicity of the competing approaches (and methods) in terms of the tail risk measures-VaR and ES.Table 1 presents the summary statistics of the returns of all index futures.For our analysis, we identify the cut-off point in each extreme tail according to the EV theory discussed in a recent study on the VaR backtest (Mozumder et al. 2017).We note that the extents of extremity in return series corresponding to various indices are not similar.
Table 2 presents a good fit to the data for both long and short positions obtained with GP distribution (GPD); the tail indices are positive except for the Nikkei225; and the (8 estimated scale parameters fluctuate around 1. Table 2 also provides assumed thresholds u, the number of exceedances (N u ) contingent on the choice of thresholds u, and the observed exceedance probabilities (Prob) contingent on the choice of thresholds u.
Table 2 also presents the asymmetry of long and short positions of tail definition (u) choices in an EV model.The same cut-off point results in a different number of leftalone observations on the tails for long and short positions.Parameter estimates are expected to differ with respect to different positions under EV models.In the case of full-density-based Levy models, it results in only a sign alteration of the skewness characterizing parameter4 corresponding to long and short positions. 5o visualize the differences in the models' fit in the tails, for all indices under consideration, we separately present the GP EV tail alongside the tails for each of our considered Lévy models.Our strategy is to obtain the EV quantiles above thresholds and the corresponding quantiles from the Lévy models.Thus, instead of fixing tail mass, we set the thresholds.Figure 1 depicts the QQ-plot of EV with each of the Lévy models for all indices.There is clear evidence of deviation between EV and Lévy quantiles at the extreme, although the EV models reveal smaller deviations in most cases.Such differences may be attributed to how different Lévy models feed information from observations outside the tails in fitting the tails.
Table 3 presents the maximum likelihood estimate of the parameters for all five indices and all four Lévy models.The threshold value selection is an important factor as it has the strongest effect on the results.While larger thresholds produce few EVs and lead to large variances, smaller thresholds generate a sample that approximates the models poorly.We select the smallest threshold value among those that produce EVs, following the limit exceedance model.We use the mean residual plot to determine thresholds u and the probabilities of exceedances.Both the EV and the Lévy Table 2 Parameter estimates for Generalized Pareto Distribution (GDP) Maximum likelihood estimates of the GPD parameters for long and short positions are based on daily % returns of futures contracts from January 1, 2007, to December 31, 2017.u is the threshold (selected using the threshold selection procedure with mean residual plot), N u is the number of exceedances in excess of u, Prob is the probability of observation in excess of u, ξ is the tale, and β is the scale parameter, respectively.In parenthesis, estimated standard errors of the parameters are reported models provide a similar data description.However, as expected of long and short positions, the tail-based EV parameters are significantly different-a sharp contrast to Lévy parameters, which use the entire data set.As Lévy models on the complete data of short and long positions flip the densities along the y-axis, long and short positions alter the sign of the parameter that characterizes the skewness in the model.Thus, it is sufficient to report the estimates corresponding to long positions alone for which the risk measures VaR and ES under tail targeting and full density-based models will be investigated.

Estimation of risk measures
Except for a few specific cases, VaR is obtained by solving the following quantile-integral equation: where α is the coverage level.The VaR for different Lévy models can be obtained by solving Eq. ( 10) with corresponding Lévy density.
The major problem with VaR is that it indicates the magnitude of loss to a certain level but ignores the magnitude of losses that exceed the pre-fixed confidence level.Thus, VaR identifies the tail to a given level but has no answer regarding how concerting that tail is with respect to the pre-fixed confidence level.In addition to identifying the tail to a given pre-fixed level, ES as a measure provides the average of losses belonging to the identified tail contingent on the pre-fixed level. (10)

Table 3 Maximum likelihood estimates for Generalized Hyperbolic Lévy models
Based on daily % returns for long futures positions on daily % returns of futures contracts, from January 1, 2007, to December 31, 2017.We report the estimates of the VG, NIG, HYP, and GH parameters.Prob is the probability of observation in excess of u (the same thresholds selected for the EV model in Table 2).In parenthesis, estimated standard errors of the parameters are reported.As in Eq. ( 9), the high αth quantile (i.e., VaR at a very high confidence level α) is given as follows:

Index
and the ES of the same confidence level of α is as follows: where β and ξ are scale and shape parameters contingent on threshold u, respectively.
In Eq. ( 11), n is the total number of observations, and N u is the number of observations that exceed threshold u.The ES yields from the fundamental equation are as follows: For a VG model, ES is then obtained with VG density: The approach is similar to obtaining the ES for other Lévy models but incorporates different densities in Eq. ( 14).We apply the parametric bootstrap to get the SEs and confidence intervals (CIs) of risk measures, following Cotter and Dowd (2006).However, as Lévy models have no closed-form expressions for risk measures, it is a computational challenge that we overcome using a machine with a powerful configuration.

GOF: EV versus Lévy
Among various GOF tests, one that is particularly suitable for tail-based risk management studies6 is the Anderson-Darling (AD) test.It is about a weighing rule introduced by Anderson andDarling (1952, 1954) in the Kolmogorov-Smirnov test that emphasizes the observations in the tail.Anna et al. (2005) provided a formula for an AD test statistic when the distribution of the complete sample is unknown and observations are only available at the extreme tail, referred to as left-truncated data adapted AD test.This adaptation fits the test of the EV model.For the AD test version applied with complete distributions such as our Lévy models having closed-form densities, the p-values are analytically available.However, for the AD test adapted to left truncated data, p-values need to be calculated through bootstrapping or Monte Carlo simulation.In this study, we consider 1,000 resampling and calculate the p-value for the EV model using bootstrap.We use VaR for our Lévy models and VaR for the EV model as critical values for ( 11) the tests (the left truncated version of the test of VaR for the EV model remains consistent as it is computed from a left-truncated density): where u is the truncation level; x j is the jth observed value of the order statistic X 1 ≤ X 2 ≤ .... ≤ X n ; and n represents observations in the tail (total number).Table 4 presents the statistics for applying both the GOF tests (AD and its left truncated version AD ev ).By the very nature of the tests, AD and AD ev are differently informative about the tail fits.Based on Table 4, EV and full density Lévy models perform statistically almost similarly on the tail.However, as AD and AD ev values exhibit ( 16) observations outside the tails, they influence the test results even when the weights attached to such observations are much less than those attached to the tail.

Comparison of risk measures
This section analyzes the Lévy and EV estimates of VaR and ES.The estimates of VaR are based on GPD and the four Lévy models.We report the parameters of each model calibrated under both approaches in Tables 5, 6, 7, 8 and 9.As VaR and ES are based on higher coverage levels, they account for trading losses at a very high level due to extreme (unexpected) events.Surprisingly, VaR estimates across the empirical values are approximately of the same order of magnitude for all indices, but that is not true for ES estimates.For the ES model, the estimates depend on the entire tail shape of the model but not only on a specific quantile of the tail.However, between the approaches, it is difficult to claim with certainty that, based on the estimates of VaR and ES, any particular approach is better than another.The EV model occasionally provides VaR and/or ES estimates that deviate less from their empirical counterparts.However, Lévy-VaR and/ or Lévy-ES estimates have less deviation from their empirical counterparts on different occasions.
Looking into the precision of VaR estimations may help us ascertain some preference between the approaches.Overall, the SEs of the Lévy model VaR estimates are much

Table 5 Estimates of VaR and ES risk measures for S&P500 futures position: EV versus Lévy
The estimates are based on the parameter values in Tables 2 and 3 using daily % return.Here, α is the coverage level of VaR and ES when estimated under the EV and Lévy approaches, corresponding to a holding period of 1 day.Next to each estimate, SEs are reported, and the 90% confidence intervals are immediately below (normalized by bootstrapped estimates).The adequate ES estimates are depicted in bold, and the adequate VaR are shown in bold and Italics.Again there is no clear pattern of preference between the approaches (EV vs. Lévy)

Model
Risk Measure α = 0.99 α = 0.995 α = 0.999 lower than those of the EV model VaR estimates.The rise in coverage levels reinforces this observation.The coefficient of variations (estimated risk measure value divided by corresponding SE) helps us to double-check this observation.Thus, the VaR estimates based on the Lévy models are more stable than those based on the EV model.However, this may be partly due to a few observations under the EV model.Tables 5, 6, 7, 8 and 9 report a 90% CI for VaR estimates obtained with bootstrapped estimates.We find that at low coverage levels, the estimated CIs for both EV and Lévy models are symmetric.However, at higher coverage levels, confidence levels are asymmetric, with the upper bound moving further away from the mean of the estimates (bootstrapped).In contrast to the GP, it is difficult to establish clear-cut results for the four Lévy models through a comparison of the SEs of VaR estimates.Unlike GP, often at ultra-high coverage (0.999), the CIs exhibit ultra-spread, which indicates unstable forecasts at ultra-high coverage, presumably because the estimations are based on a few extreme observations above the threshold 7 alone.

Empirical
Overall, the ES CIs are narrower than those of VaR, indicating that ES measures are more precisely estimated than VaRs.VaR and ES bootstrapped statistics (SE and CI) are informative regarding some differences and similarities between Lévy and EV approaches.VaR and ES bootstrapped statistics are narrower for Lévy models than for EV models, which indicates that the estimation with a Lévy approach is more stable than

Table 6 Estimates of VaR and ES risk measures for FTSE100 futures position: EV versus Lévy
The estimates are based on the parameter values in Tables 2 and 3 using daily % return.Here, α is the coverage level of VaR and ES when estimated under the EV and Lévy approaches, corresponding to a holding period of 1 day.Next to each estimate, SEs are reported, and the 90% confidence intervals are immediately below (normalized by bootstrapped estimates).The most adequate ES estimate is depicted in bold, and the most adequate VaR is shown in bold and italics.Again there is no clear pattern of preference between the approaches (EV vs. Lévy)

Model
Risk measure α = 0.99 α = 0.995 α = 0.999   that with an EV approach.Moreover, the estimation performance deteriorates under both approaches with increased coverage.For NIG and GH models, estimation instability seems to propagate much faster, especially at higher coverage levels.

VaR and ES backtesting
We conduct dynamic calibration on a rolling window for backtesting.As daily VaR(α) is estimated on daily return, the loss for the one-day holding of an asset can only be violated for 100% α of the time, allowing all possible extremities.We use an indicator variable describing the hit sequence that identifies the day of VaR violation in the following T trading days.The hit sequence is Bernoulli distributed with a probability α of assuming 1.We implement three VaR tests-unconditional, independence, and conditional coverage.We use two tests without distributional assumption for ES backtestsunconditional-normal and unconditional t-test.

Unconditional test
The unconditional hypothesis of backtesting does not hold any assumption regarding today's violation status when it provides statistical evidence as to whether the observed proportion of violation (PV) of a VaR model tomorrow is significantly different from its promised fraction α.However, the evidence is provided through an asymptotical test statistic following χ 2 with one degree of freedom: Here, T = T 1 + T 0 is assumed to be significantly large, and T 0 and T 1 are the number of days with no violation and days with a violation, respectively.We use the Monte Carlo simulation to calculate the p-value.We compute the Monte Carlo p-value to simulate 999 test values, {LR(i)} 999 i=1 , each of which is based on a Bernoulli (α) sample of hit sequences having the same size as the original sample: The simulated p-value is the proportion of simulated test values given that a simulated test value is more significant than the test value to roughly the number of simulations.

Independence test
The independence test checks whether VaR violations are truly random and not clustered over time.As assets with volatility clustering yield VaR exhibiting clustering, we can predict that if there is a violation today, then we will most likely find a violation tomorrow, which is more than 100% α likely.VaR adjusts to the predictions of high volatility as useful information, ensuring the risk model is correctly specified, and the violation of VaR remains unpredictable.The test statistic of the independence test is given as follows (Christoffersen 2003): where the matrix of transitional probabilities of conditional violations is given as follows: Thus, we can write where p characterizes the matrix of transitional probabilities of violation, ensuring no dependence between 0 and 1 in the hit sequence: (18) is similar to the unconditional hypothesis.LR ind provides the statistical significance of the likelihood of independence in the hit sequence over the likelihood of dependence.

Conditional coverage test
The conditional coverage test checks whether the average number of violations changes with the level of a risk model.The conditional coverage test statistic has a similar expression as the independence test statistic with p = T 1 T of the independence statistic replaced by the coverage level α of the risk model (Christoffersen 2003;Dowd 2005):

ES backtesting
The unconditional coverage test statistic proposed by Acerbi et al. (2014) for ES backtesting is as follows: where X t represents profit and loss distribution along a real but unknowable distribution and is forecasted by a model predictive distribution conditional to previous information used to compute ES, and I t is an indicator function, which is equal to 1 when the fore- casted VaR is violated, that is, X t < −VaR α,t and 0 otherwise.
We use two tests without distributional assumption-unconditional-normal test and unconditional-t test (see Acerbi et al. 2014;Acerbi et al. 2017).The unconditional-normal test assumes that X t follows a standard normal distribution, whereas the unconditional-t test assumes that X t follows a t-distribution.The unconditional test statistic is sensitive to the severity of the VaR failures relative to the ES estimate and the frequency of VaR failures.As a result, a rare but colossal VaR failure(s) relative to the ES may result in the rejection of a model over a particular timeframe.
However, when the ES estimate is large on a violation day, it may not impact the test results as much as a large loss would have if a smaller ES were encountered.Similarly, a model can be rejected due to many VaR failures even if all VaR violations are just slightly higher than the VaR as such failure contributes to making the test statistics negative.Thus, glimpses of asymmetry in several VaR violations among different models and asymmetry in expected and observed severity (severity ratio) are critical for ES backtesting.

Backtesting results
We now examine the sensitivity of the risk measure VaR to new observations for dynamic calibration on a rolling window of four business years with a two-year look-back window and continue to increase the window length.To avoid the problem with EV dynamic calibration that considers only extreme observations, we increase the proportion of ( 24) extreme observations by adjusting the threshold and expanding the length of the lookback window.We consider the extreme 30% observations in calibration dynamically for EV and the coverage levels of 95% and 99%.First, we calibrate all the models on the time series of returns for 2007-2010 on December 31, 2010 and use the calibrated parameters to predict the VaR and ES for January 1, 2011.This gives us an additional new observation of returns on January 1, 1995.We remove the oldest observation to accommodate this new observation in our fixed length look-back window and then calibrate the models in a new window to predict the VaR and ES on January 2, 2011.The process continues until the end of 2003.Thus, the dynamic calibration starts on January 1, 2011 and ends on December 31, 2017.The unconditional, independence, and conditional coverage hypotheses are tested with 95% VaRs.The backtesting checks whether unconditional and conditional distributions influence conditional and unconditional coverage hypotheses tests.The backtesting results for long positions in all indices and the PVs are presented in Table 10. 8able 10 reveals that the EV-VaR is not distinguishable from that of the full densitybased Lévy-VaR given the observed PV performance of backtesting.The PVs corresponding to an EV model are closer to the promised fraction of violations for all indices, except for Nikkei225.The PVs corresponding to EV models for Nikkei225 deviate more from the promised fraction of violations than those for Lévy models.Thus, the tail-based risk measure of VaR obtained for the tail-based model of EV and full density-based Lévy models are almost similar.
For the remaining indices, the results from the hypotheses testing are mixed.As the VaR violations are clustered at 95%, the independence test fails in most cases.However, the test passes the 99% coverage.On the other hand, the conditional coverage hypothesis is supported in most cases, although both unconditional and independence hypotheses are not supported.A significant deviation of the observed PV from the promised PV for unconditional coverage may have contributed to the rejection of the conditional coverage hypothesis.We also report the Chi-square and the Monte Carlo simulated p-values, which test the effectiveness of the test statistics.We find that both the Chi-square and p-values are close to each other, implying that our tests are relevant.
The last two columns in each table report the results of the ES backtesting.We report two ES backtesting results without distributional assumption-the unconditional-normal test (unconditional-N) and the unconditional-t test (unconditional-t).The p-values of the tests, which represent the success rate when multiplied by 100, are reported in parentheses.We identify each test as a "pass" (P) or a "fail" (F) based on the p-values in the table.All tests are conducted at a 95% confidence level.None of the unconditional-N tests passed.However, the unconditional-t tests passed in only a few instances.Thus, our results do not suggest any preference for the EV or Lévy models through ES backtesting.

Discussion
We have investigated four full-density Lévy models and estimated the tail-focused risk measure VaR and its coherent version ES, in addition to estimating VaR and ES for an EV model (a tail targeting approach).The parameters calibrated for all five models under   11 presents the frequency distribution of significant estimates reported in Table 5, 6, 7, 8 and 9.We have 15 estimates of VaR and 15 estimates of ES risk under the risk model EV and its contender Lévy estimated across all five indices and three coverage levels.Generally (11 out of 15 estimates), we find that the Lévy-VaR forecasts are closer to respective empirical estimates than their EV counterparts.Nevertheless, such observation does not allow us to declare inadequacy to disfavor the EV approach.However, in the Lévy category, the NIG (4 out of 11) and the GH (6 out of 11) models provide much more appreciable forecasts (in the sense of having minimum absolute deviation with empirical estimates) of VaR compared with the VG and the HYP Lévy models.Among the remaining Lévy models, VaR forecasts favor the HYP model, supporting the derivative pricing concept (Schouten 2003).This implies that a fully flexible GH model forecasts the quantiles more befittingly than its restricted versions.
Regarding the restricted versions, NIG characterization seems to have minimal effect on forecasts due to restriction.Regarding 15 ES forecasts, the EV model counts for the 8 most favorable forecasts, which is not sufficient to be deemed adequate.Looking into odds for Lévy-ES forecasts, we find 3 out of 7 for NIG, 2 out of 7 for GH, 1 out of 7 for the VG, and 1 out of 7 for the HYP model.Thus, the presumed myth that a tail-targeting model is more likely to provide superior, consistent forecasts for tail-focused risk measures VaR and ES is empirically confronted.The simplicity of EV-VaR and EV-ES is attractive, but a comparison with empirical values often disputes their adequacy.
We find some randomness in classifying the superiority of an approach over another given a specific timeframe.The frequency distribution of adequacy between the simple tail targeting EV analytic risk model and the relatively complex full density driven Lévy risk models is presented in Table 11.While it is difficult to claim that a particular fulldensity Lévy model is superior irrespective of data ranges, it is impossible to claim that the tail-targeting EV model has any sense of adequacy irrespective of all data ranges.

Table 11 Frequency distribution of tail risk estimates of analytic EV and root search-based Lévy models
The numbers are of the adequate estimates found in Tables 5, 6, 7, 8 and 9 corresponding to EV and Lévy models.The total number of estimates is reported in parentheses.The approaches have no discernible preference pattern (EV vs. Lévy) Given such similarity of forecast performances in both approaches, the choice is likely to be determined by a compromise between user-define-simplicity and user-perceivedadequacy.This suggests that the performances of risk measure VaR and its coherent version ES are different and fail to adequately identify the risk profiles of assets.When the VaR and ES models identify the risk profiles of assets similarly, both VaR and ES would be adequate for tail targeting either the EV model or some full-density Lévy model.However, the performance of VaR and ES are mixed across EV and Lévy models.This should not be interpreted that the test statistics results of model fitting performances are contradictory.
It is well known that the AD test is tail-emphasized.Therefore, quantile mismatch outside the tail is barely detected by the AD ev test, which is applied to the tail-targeting EV model.Based on the AD test statistics in Table 3, the EV model has some preference over the Lévy models.However, this is only based on the tail quantile match of EV.This means that it hardly bears information on quantile matches far outside the tail.This is why an AD test value of a solely tail-based EV model might turn deceptive when compared with the AD test value of an entire distribution-based Lévy model.This deception can hardly be adequately detected by applying the GOF test emphasized on the tail.Thus, it is not surprising that the seemingly preferable EV model turns elusive and does not yield the most adequate forecasts of risk measures, i.e., VaR and ES.Our backtesting results about VaR and ES also confirm such findings.The EV-VaR or EV-ES results are not significantly different from VaR and ES based on Lévy models.In most cases, the results are mixed.

Conclusion
We investigate and compare the simplicity and adequacy of tail-focused VaR and ES risk measures for tail-targeting EV models with the full density-focused Lévy-VaR and Lévy-ES using data on futures contracts of S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 indices from January 1, 2007to December 31, 2017, covering the 2007-2008 Global Financial Crisis that led to the subprime mortgage debacle in the US.We find that returns discarded by the EV model (as they do not characterize the extreme unexpected market losses) and incorporated by the Lévy models do have some effect on the performance of tail-focused risk measures VaR and its coherent version ES.Thus, without an immutable law justifying any preference between "tail-alone" and "full-densitybased" models for tail-focused risk management, this study provides a heuristic analysis illustrating the potential effects of the observations discarded by an EV model on risk estimates when considered under Lévy models.
The tail-based EV models are simpler for the analytic formulation of the tail-focused risk measures of VaR and tail-aggregate risk measure of ES compared with the Lévybased measure.Moreover, the EV models are simpler to implement in risk measure calculations.However, we find that the EV models are inadequate as the performance of EV risk estimates is not necessarily superior to that of Lévy risk estimates.On the other hand, we cannot assure that a full density-based model based on Lévy distribution adequately assesses risk measures.Thus, the adequacy of a simpler model with a more straightforward implementation becomes a relative consideration.Our model testing reinforces the theoretical fundamental that the extreme observations in the tails of the EV model (discarding smaller systematic returns) and all observations, including smaller and extreme ones in Lévy models, are different approaches with the common goal of meaningful simplification of reality.Their relative performance for a particular time window may fail to offer any guarantee.Given such randomness of estimation performances under both approaches (for different ranges of data and coverage levels), the choice should be determined by a compromise or trade-off between simplicity and user-defined adequacy.Our study period covers the 2007-2008 Global Financial Crisis.The analysis can be extended to other financial crisis periods, e.g., the Russian financial crisis, the default crisis related to the Long-Term Capital Management of 1998, 9 and more recent the coronavirus-related turmoil in the financial market during 2020-2021.
Our analysis is based on only a selection of EV and Levy models.The study can be extended using other types of models.Practitioners should not rely on one set of desired models and ignore others when implementing VaR and ES estimation.The simplicity of a model does not guarantee its adequacy.However, the adequacy of a model based on "full-density, " e.g., the Lévy-based model, may not always be the best when a simpler tailbased model will provide more robust VaR and ES results.As many banks and financial institutions do not follow adequacy requirements in risk measures following the Basel agreement, our findings shed empirical light on such complexity.Therefore, when the results are mixed, banks and financial institutions as well as policymakers should find a way to compromise simplicity and adequacy.

Table 1
Summary statistics of the futures returns We report the summary statistics of returns on futures for the world indexes S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225.From January 1, 2007, to December 31, 2017, Futures contracts expire in the following trading months; rollover from one expiring contract to the next occurs at the start of each trading month

Table 4
Anderson Darling (Lévy) and left truncated Anderson Darling (EV) Goodness-of-fit In the case of a left-truncated Anderson Darling test, 1000 resampling is considered to obtain the p-values by bootstrapping.(*)denotes that models survive the test to the given significance level.AD-stat for the EV model is obtained with AD ev of Eq. (17).There is no clear preference between the approaches (EV vs. Lévy)

Table 7
Estimates of VaR and ES risk measures for DAX futures position: EV versus Lévy The estimates are based on the parameter values in Tables 2 and 3 using daily % return.Here, α is the coverage level of VaR and ES when estimated under the EV and Lévy approaches, corresponding to a holding period of 1 day.Next to each estimate, SEs are reported, and the 90% confidence intervals are immediately below (normalized by bootstrapped estimates).The most adequate ES estimate is depicted in bold, and the most adequate VaR is shown in bold and italics.Again there is no clear pattern of preference between the approaches (EV vs. Lévy)

Table 8
Estimates of VaR and ES risk measures for Hang Seng futures position: EV versus Lévy The estimates are based on the parameter values in Table 2 and 3 using daily % return.Here, α is the coverage level of VaR and ES when estimated under the EV and Lévy approaches, corresponding to a holding period of 1 day.Next to each estimate, SEs are reported, and the 90% confidence intervals are immediately below (normalized by bootstrapped estimates).The most adequate ES estimate is depicted in bold, and the most adequate VaR is shown in bold and italics.Again there is no clear pattern of preference between the approaches (EV vs. Lévy)

Table 9
Estimates of VaR and ES risk measures for Nikkei225 futures position: EV versus Lévy Note:The estimates are based on the parameter values in Tables 2 and 3 using daily % return.Here, α is the coverage level of VaR and ES when estimated under the EV and Lévy approaches, corresponding to a holding period of 1 day.Next to each estimate, SEs are reported, and the 90% confidence intervals are immediately below (normalized by bootstrapped estimates).The most adequate ES estimate is depicted in bold, and the most adequate VaR is shown in bold and italics.Again there is no clear pattern of preference between the approaches (EV vs. Lévy)

Table 10
Backtesting results for conditional and unconditional models (95% Coverage)Backtesting results for conditional and unconditional models: S&P500, FTSE100, DAX, HangSeng, and Nikkei225.PV stands for the proportion of VaR violations.P-values from both Chi-square and Monte Carlo (MC) simulations are reported.The last two columns show the unconditional-normal and unconditional-t test results for ES backtesting.We report whether the tests Pass (P) or Fail (F) and the p-values in the parentheses

Table 10 (
continued)both approaches are presented in Table2 and 3. VaR and ES are based on high coverage levels, accounting for extreme events governing high trading losses.We analyze the performance of VaR and ES risk measures, utilizing full-density Lévy models of VG, NIG, HYP, and GH, and compare them with the VaR and ES estimates obtained with the tail density-based EV model.The results reveal that it is very difficult to ascertain any comprehensive superiority of one approach over the other.Table