- Research
- Open access
- Published:

# An evaluation of the adequacy of Lévy and extreme value tail risk estimates

*Financial Innovation*
**volume 10**, Article number: 100 (2024)

## Abstract

This study investigates the simplicity and adequacy of tail-based risk measures—value-at-risk (VaR) and expected shortfall (ES)—when applied to tail targeting of the extreme value (EV) model. We implement Lévy–VaR and ES risk measures as full density-based alternatives to the generalized Pareto VaR and the generalized Pareto ES of the tail-targeting EV model. Using data on futures contracts of S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 during the Global Financial Crisis of 2007–2008, we find that the simplicity of tail-based risk management with a tail-targeting EV model is more attractive. However, the performance of EV risk estimates is not necessarily superior to that of full density-based relatively complex Lévy risk estimates, which may not always give us more robust VaR and ES results, making the model inadequate from a practical perspective. There is randomness in the estimation performances under both approaches for different data ranges and coverage levels. Such mixed results imply that banks, financial institutions, and policymakers should find a way to compromise or trade-off between “simplicity” and user-defined “adequacy”.

It can scarcely be denied that the supreme goal of theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.

-Albert Einstein^{Footnote 1}

## Introduction

Value-at-risk (VaR) is an intuitively simple tail-based risk measure that is popular among practitioners and academics. Recent applications of VaR were highlighted by Perignon and Smith (2010a), Frésard et al. (2011), and Perignon and Smith (2010b). However, the VaR measure has some limitations. It fails to satisfy the requirement of sub-additivity, implying that it does not fulfill the requirement of coherence. Further, VaR fixes the tail events corresponding to a specific confidence level. Although it considers the likelihood of conditional tail events, it ignores the magnitude of the catastrophe after the occurrence of a tail event. In a nutshell, VaR provides a snapshot of unsystematic losses while failing to consider the actual size of unsystematic losses that exceed the cut-off points. To offset such limitations and ensure that the coherence (sub-additivity) requirements are met, the expected shortfall (ES) measure has been introduced. ES estimates the unsystematic loss by weighing all the possible losses in the tail of the distribution, thus circumventing the limitation of VaR.

Many studies (e.g., Longin 1996; McNeil and Frey 2000; Jondeau and Rockinger 2003; Gençay and Selçuk 2004; Tolikas and Gettinby 2009; So and Wong 2012; Cheng et al. 2015; Du and Escanciano 2017; Bayer and Dimitriadis 2022; Otto and Breitung 2022) have used simple to adequately complex methodologies and assumed different distributional properties in data-generating process for estimating and backtesting VaR and ES models. Some recent studies have also identified VaR forecasting breakdown due to structural change and a break in the data-generating process of returns (Chavez-Demoulin et al. 2014; Quintos et al. 2001). Through joint modeling, time-varying conditional VaR, and ES, Taylor (2019) produced forecasts with generalized autoregressive conditional heteroskedasticity (GARCH) (1,1) and Glosten–Jagannathan–Runkle GARCH (1,1) models using maximum likelihood based on a Student *t*-distribution, and the asymmetric Laplace likelihood was used to evaluate post-sample VaR and ES forecasts. Patton et al. (2019) also jointly modeled VaR and ES in a new dynamic framework, which is semiparametric and agnostic about the conditional distribution of returns, and confirmed via simulation that the proposed new ES-VaR models outperform forecasts based on GARCH or rolling window models. The forecasting process of VaR and ES requires sophisticated and complicated models. Lazar and Zhang (2019) examined whether the inadequacy of modeling leads to the model risk of such risk measures and found that ES estimates using GARCH models require more minor corrections for model risk than VaR.

However, there is some trade-off between simplicity and adequacy when deciding on models and the underlying data-generating processes. Under the Basel framework, the Bank for International Settlements (BIS 2019) requires banks to establish “an *adequate* system for monitoring and reporting risk exposures” to assess risk profiles. With this “adequacy” in mind, Hoga and Demetrescu (2022) developed a sequential procedure that can directly and continuously determine risk assessments based on VaR and ES forecasts with controlled size based on the *t*-GARCH model. Kourouma et al. (2010) found an underestimation of the risk of loss for unconditional VaR models (historical and extreme values (EVs) theory VaR model) compared with conditional models. The conditional EVT model is more accurate for predicting risk losses during the 2008 Global Financial Crisis. Despite their accuracy, banks are reluctant to use conditional EVT models as the Basel II agreement penalizes banks for using such models.

Therefore, a relevant question arises as to why banks or financial institutions are reluctant to use some models that are adequate to satisfy the BIS framework. Why is the simplicity of tail-targeting EV models, which are much easier to implement, not attractive and adequate compared with models chosen by banks and financial institutions, or do they define “adequacy” from a different perspective? For our study, the heuristic adequacy of the measure is the naive “closest to the empirical estimate.” The EV models are based on the distribution of extreme returns instead of all returns (Bali 2007). The simplicity of the tail, characterized by the EV model, leads to the analytic formulation of tail-focused risk measures of VaR and the tail-aggregate risk measure of ES.

On the other hand, the Lévy heavy-tailed generalized hyperbolic (GH) family models are based on full-density distributions and are not easy to implement. We consider the Lévy family models and purely extreme tail-based extreme value (EV) models from an adequacy perspective to quantify investors’ risks. It is the adequacy perspective that should guide us in the choice of models in risk measures but not the simplicity. However, if simplicity is advocated, then how should the adequacy of models be determined? Thus, whether the performance of EV risk estimates is superior to that of Lévy risk estimates is an empirical question.

In this study, we conduct a heuristic study to determine the adequacy of VaR and ES estimates of investment risks in leading indices during a period when markets were falling and recovering from the global financial or subprime crisis (2007–2008). The standard VaR measures provide inaccurate estimates of losses during highly volatile periods as they require an explicit functional form (normal or lognormal) on the distribution. EV models have seen many applications in modeling extremities of weather, reserves, and financial extremities (Pidgeon 2012; Monier and Gao 2015; Cheng et al. 2014). However, unlike the spectral risk measure, VaR and ES are purely tail-based risk measures. As EV is also purely a tail-based method, it might cause an uncanny impulse that VaR and ES with the EV method might be a good alternative.

Lévy-based heavy-tailed models belong to another class of models that has also been investigated in modeling extreme fluctuations. Lévy models use full data to estimate the parameters. In contrast, an EV model uses only the partial data remaining in the extreme tail of a distribution concerning a certain cut-off point.

EV models use only a few extremely large returns on the tail in calibration when one believes that the extreme tail data follow a generalized EV distribution. This presumption makes us not worry about the true distribution of returns smaller than the threshold that are not in the tail. This idea may be sufficient to get reasonable numbers for the risk measures of VaR and ES defined on the tail. Such an approach applies fewer data points than models that calibrate historical data. It is accepted that the relevance of systematic fluctuations (small return values) not on the tail can be ignored when modeling. Nevertheless, we do not possess an axiomatic justification to assume this will necessarily be true.

This prompts us to investigate the risk of investment in world markets falling and recovering during the Global Financial Crisis of 2007–2008. We adopt a range of models with moderate time-varying volatility and incorporate a stochastic diffusive perturbation of the markets with time as our benchmark. We consider models of the GH family that include stochastic volatility through stochastic time changing without an explicit dynamic for volatility. We assess the comparative performance of tail modeling using both Lévy (both systematics and unexpected returns) and EV models (only unexpected returns), followed by a comparison of the performances of the respective tail-based measures of risk VaR and ES based on both approaches. Thus, we follow a procedure of fixing the tail as applied in standard EV calibration that only uses unexpected returns. We then obtain Lévy-tails with calibrations that use both systematics and unexpected returns.

The mathematically elegant Lévy approach has a significant limitation, i.e., except for a few trivial cases, there is no analytic formula for the risk measure VaR, without even mentioning the case of ES. Therefore, the VaR calculation is relatively difficult to implement. Complicacies and huge time requirements in implementation have deterred practitioners from using Lévy models to forecast VaR and ES, as observed in a VaR backtesting study. To the best of our knowledge, this is the first study to compare the performance of tail-based VaR and ES estimated for the EV and Lévy models by contrasting the adequacy of tail-based risk measures while considering simplicity in estimating with this tail-targeting model.

The superior method is unclear at the outset. The presumably advantageous use of the returns in the EV model might not be advantageous in practice when the evaluator is obliged to consider small return values that influence the fits, which must determine the shape of the tail. Moreover, there are concerns about whether applying extreme return observations on the tail can be sufficient to model extreme fluctuations, even when market movements are not extreme (such as the decade following the 2007–2008 Global Financial Crisis). In this study, we seek to shed empirical light by estimating the tail-based VaR and ES following both approaches in the existing theoretical framework using data from major indices. The sample period is when markets suffered and recovered from the Global Financial Crisis.

The contribution of this study is that we assess the relative importance of the adequacy and simplicity of EV and Lévy models in estimating VaR and ES. We try to answer the following questions: “Is the simplicity of EV models adequate to guarantee that they would perform robustly in describing extremely unexpected return phenomena?” “Should the adequacy of the Lévy models be more important in the backdrop of the simplicity of EV models?” We find that the simplicity of the tail-characterized EV model leads to the analytic formulation of the tail-focused risk measures of VaR and the tail-aggregate risk measure of ES. However, the performance of EV risk estimates is not necessarily superior to that of Lévy risk estimates. On the other hand, VaR estimates based on Lévy models are more stable than those based on the EV model. However, it is not possible to establish an indisputable rule for a particular Lévy model. The performance of models with only a few unexpected extreme returns (EV model) and modeling with both numerous smaller expected and few extreme unexpected returns (Lévy models) are mixed.

Our model testing with heuristic adequacy measure reinforces the theoretical fundamental that only extreme observations of the EV model (discarding smaller systematic returns) and smaller and extreme observations of Lévy models are different approaches with the common goal of a meaningful simplification of reality. Their relative performance for a particular time window may fail to constitute any guarantee. There is some randomness of estimation performances under both approaches. The explanatory power of the approaches is hardly distinguishable across our data.

In this study, we determine whether a model is solely tail-targeting or has little effect on VaR and ES forecasts at least for the most crucial periods, comprising both sharp market downturns and smooth recoveries of the 2007–08 Global Financial Crisis. Here, the choice of model can be based on a compromise or trade-off between simplicity and user-defined adequacy. As many banks and financial institutions do not follow adequacy requirements in risk measures as stipulated in the Basel agreement, our findings shed empirical light on such complexity. Our results imply that when the results are mixed, banks and financial institutions should find a way to compromise simplicity and adequacy.

The remaining parts of the paper are structured as follows. Sections “Characterization in Lévy framework” and “Initial data analysis” discuss the characterization of a Lévy framework and initial data analysis. Sections “Estimation of risk measures” and “GOF: EV versus Lévy” discuss the estimation of VaR and ES under Lévy, its contender EV approaches, and the goodness of fits (GOFs) under contending approaches. Section “Comparison of Risk Measures” discusses the Comparison of Risk Measures. Section “VaR and ES Backtesting” compares the forecasts of VaR and ES for the approaches. Section “Discussion” provides a discussion, and we conclude in Sect. “Conclusion”.

## Characterization in Lévy framework

Lévy models have recently been applied in modeling extreme behavior analysis (German 2002; Fajardo 2015; Fajardo and Mordecki 2006, 2014; Kim et al. 2008; Fuse and Meucci 2008; Wong and Guan 2011, De Oliver et al. 2018; Farkas and Mathys 2022). The characteristic function of a stochastically continuous process that starts at zero and possesses stationary independent increments has the following general form (see Sato (1999) and Schouten (2003)):

for \(s \in \Re ,t \ge 0\) and constants \(a \in \Re ,b \in \Re^{ + }\)_{,} where *ν* is the so-called Lévy measure defined on \(\Re \backslash \left\{ 0 \right\}\) that satisfies square inerrability of tiny (< 1) jumps:

Equation (1) is the so-called Lévy–Khinchine representation of a Lévy process,^{Footnote 2} which is closely aligned with the concept of infinitely divisible distribution:

Thus, the inverse Fourier transform can be applied to obtain the numerical transition density from the characteristic function (1) with the Lévy measure *ν* of a particular Lévy process, which always exists. The numerical transition densities can then estimate the risk measure VaR under different model assumptions. However, in this study, our interest is mainly on the primitive members of Lévy processes belonging to the GH class, which have been widely used in financial modeling (Barndorff-Nielson 1977, 1978, 1995; Eberlein and Prause 1998; Prause 1999; Eberlien and Keller 1995; Bingham and Kiesel 2001; Eberlien and Hammerstein 2002) due to the availability of closed-form densities.

We focus on the GH subclass^{Footnote 3} of Lévy processes (variance gamma (VG), normal-inverse Gaussian (NIG), hyperbolic distribution (HYP), and GH) and estimate the risk measures—VaR and ES—to investigate the relative adequacy of a purely tail-based simple analytic EV risk model compared with full density-based Lévy risk models. Among others, these measures have been investigated by Cotter and Dowd (2006) in the context of future contracts and by Sorwar and Dowd (2010) in the context of options contracts.

Let \(X_{1} = \log \left( {{{S_{t + 1} } \mathord{\left/ {\vphantom {{S_{t + 1} } {S_{t} }}} \right. \kern-0pt} {S_{t} }}} \right)\) for nonnegative integer *t* and is characterized by Eq. (1) (the Lévy-Kintchine Formula). For the models we consider, the equivalent processes are given more effectively by their densities (see Schouten 2003):

where *K*_{I} is the modified Bessel function of the third type with index *I*; *θ* is the skewness parameter; and *v* is the percentage excess kurtosis in the distribution for the VG model.

Due to these closed-form densities, obtaining standard errors (SEs) of each parameter becomes easier by easily computing Fisher’s information matrix.

In our context, the competing idea to the Lévy approach assumes that only the extreme returns characterize the performance of the risk measure of VaR and ES. As in the studies by Dowd (2005), Cotter and Dowd (2006), and Mozumder et al. (2017), perhaps the most elegant tool to utilize in such a context is the peaks-over-threshold, which lies in the fact that as threshold *u* becomes larger, the distribution of exceedances converges to generalized Pareto (GP) distribution, having the following two-parametric characterization:

where

*ξ* and *β* > 0 are shape and scale parameters, respectively, contingent on each choice of threshold *u*.

## Initial data analysis

We employ future contracts return data; our empirical analysis is based on the returns of the S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 indices. We choose futures contracts because there is a lack of studies on EV and Levy that employ futures contracts data. The data are about futures contracts from January 1, 2007 to December 31, 2017, which expire in the following trading months. The rollover from one expiring contract to the next occurs at the start of each trading month. Datastream considers padding the dataset and considers bank holidays’ end-of-day price as the previous trading day’s end-of-day price—a technique accepted by practitioners and ensures we have the same daily returns for all indices (2762). Our sample period comprises the period of the 2008 Global Financial Crisis and beyond. This helps us to check the robustness of the adequacy versus simplicity of the competing approaches (and methods) in terms of the tail risk measures—VaR and ES.

Table 1 presents the summary statistics of the returns of all index futures. For our analysis, we identify the cut-off point in each extreme tail according to the EV theory discussed in a recent study on the VaR backtest (Mozumder et al. 2017). We note that the extents of extremity in return series corresponding to various indices are not similar.

Table 2 presents a good fit to the data for both long and short positions obtained with GP distribution (GPD); the tail indices are positive except for the Nikkei225; and the estimated scale parameters fluctuate around 1. Table 2 also provides assumed thresholds *u*, the number of exceedances (*N*_{u}) contingent on the choice of thresholds *u,* and the observed exceedance probabilities (Prob) contingent on the choice of thresholds *u*. Table 2 also presents the asymmetry of long and short positions of tail definition (*u*) choices in an EV model. The same cut-off point results in a different number of left-alone observations on the tails for long and short positions. Parameter estimates are expected to differ with respect to different positions under EV models. In the case of full-density-based Levy models, it results in only a sign alteration of the skewness characterizing parameter^{Footnote 4} corresponding to long and short positions.^{Footnote 5}

To visualize the differences in the models’ fit in the tails, for all indices under consideration, we separately present the GP EV tail alongside the tails for each of our considered Lévy models. Our strategy is to obtain the EV quantiles above thresholds and the corresponding quantiles from the Lévy models. Thus, instead of fixing tail mass, we set the thresholds. Figure 1 depicts the QQ-plot of EV with each of the Lévy models for all indices. There is clear evidence of deviation between EV and Lévy quantiles at the extreme, although the EV models reveal smaller deviations in most cases. Such differences may be attributed to how different Lévy models feed information from observations outside the tails in fitting the tails.

Table 3 presents the maximum likelihood estimate of the parameters for all five indices and all four Lévy models. The threshold value selection is an important factor as it has the strongest effect on the results. While larger thresholds produce few EVs and lead to large variances, smaller thresholds generate a sample that approximates the models poorly. We select the smallest threshold value among those that produce EVs, following the limit exceedance model. We use the mean residual plot to determine thresholds *u* and the probabilities of exceedances. Both the EV and the Lévy models provide a similar data description. However, as expected of long and short positions, the tail-based EV parameters are significantly different—a sharp contrast to Lévy parameters, which use the entire data set. As Lévy models on the complete data of short and long positions flip the densities along the y-axis, long and short positions alter the sign of the parameter that characterizes the skewness in the model. Thus, it is sufficient to report the estimates corresponding to long positions alone for which the risk measures VaR and ES under tail targeting and full density-based models will be investigated.

## Estimation of risk measures

Except for a few specific cases, VaR is obtained by solving the following quantile-integral equation:

where α is the coverage level. The VaR for different Lévy models can be obtained by solving Eq. (10) with corresponding Lévy density.

The major problem with VaR is that it indicates the magnitude of loss to a certain level but ignores the magnitude of losses that exceed the pre-fixed confidence level. Thus, VaR identifies the tail to a given level but has no answer regarding how concerting that tail is with respect to the pre-fixed confidence level. In addition to identifying the tail to a given pre-fixed level, ES as a measure provides the average of losses belonging to the identified tail contingent on the pre-fixed level.

As in Eq. (9), the high αth quantile (i.e., VaR at a very high confidence level α) is given as follows:

and the ES of the same confidence level of α is as follows:

where *β* and *ξ* are scale and shape parameters contingent on threshold *u*, respectively.

In Eq. (11), *n* is the total number of observations, and *N*_{u} is the number of observations that exceed threshold *u*. The ES yields from the fundamental equation are as follows:

For a VG model, ES is then obtained with VG density:

The approach is similar to obtaining the ES for other Lévy models but incorporates different densities in Eq. (14). We apply the parametric bootstrap to get the SEs and confidence intervals (CIs) of risk measures, following Cotter and Dowd (2006). However, as Lévy models have no closed-form expressions for risk measures, it is a computational challenge that we overcome using a machine with a powerful configuration.

## GOF: EV versus Lévy

Among various GOF tests, one that is particularly suitable for tail-based risk management studies^{Footnote 6} is the Anderson–Darling (AD) test. It is about a weighing rule introduced by Anderson and Darling (1952, 1954) in the Kolmogorov–Smirnov test that emphasizes the observations in the tail. Anna et al. (2005) provided a formula for an AD test statistic when the distribution of the complete sample is unknown and observations are only available at the extreme tail, referred to as left-truncated data adapted AD test. This adaptation fits the test of the EV model. For the AD test version applied with complete distributions such as our Lévy models having closed-form densities, the *p*-values are analytically available. However, for the AD test adapted to left truncated data, *p*-values need to be calculated through bootstrapping or Monte Carlo simulation. In this study, we consider 1,000 resampling and calculate the *p*-value for the EV model using bootstrap. We use VaR for our Lévy models and VaR for the EV model as critical values for the tests (the left truncated version of the test of VaR for the EV model remains consistent as it is computed from a left-truncated density):

where *u* is the truncation level; *x*_{j} is the *j*th observed value of the order statistic \(X_{1} \le X_{2} \le .... \le X_{n}\); and *n* represents observations in the tail (total number).

Table 4 presents the statistics for applying both the GOF tests (AD and its left truncated version AD_{ev}). By the very nature of the tests, AD and AD_{ev} are differently informative about the tail fits. Based on Table 4, EV and full density Lévy models perform statistically almost similarly on the tail. However, as AD and AD_{ev} values exhibit observations outside the tails, they influence the test results even when the weights attached to such observations are much less than those attached to the tail.

## Comparison of risk measures

This section analyzes the Lévy and EV estimates of VaR and ES. The estimates of VaR are based on GPD and the four Lévy models. We report the parameters of each model calibrated under both approaches in Tables 5, 6, 7, 8 and 9. As VaR and ES are based on higher coverage levels, they account for trading losses at a very high level due to extreme (unexpected) events. Surprisingly, VaR estimates across the empirical values are approximately of the same order of magnitude for all indices, but that is not true for ES estimates. For the ES model, the estimates depend on the entire tail shape of the model but not only on a specific quantile of the tail. However, between the approaches, it is difficult to claim with certainty that, based on the estimates of VaR and ES, any particular approach is better than another. The EV model occasionally provides VaR and/or ES estimates that deviate less from their empirical counterparts. However, Lévy–VaR and/or Lévy-ES estimates have less deviation from their empirical counterparts on different occasions.

Looking into the precision of VaR estimations may help us ascertain some preference between the approaches. Overall, the SEs of the Lévy model VaR estimates are much lower than those of the EV model VaR estimates. The rise in coverage levels reinforces this observation. The coefficient of variations (estimated risk measure value divided by corresponding SE) helps us to double-check this observation. Thus, the VaR estimates based on the Lévy models are more stable than those based on the EV model. However, this may be partly due to a few observations under the EV model.

Tables 5, 6, 7, 8 and 9 report a 90% CI for VaR estimates obtained with bootstrapped estimates. We find that at low coverage levels, the estimated CIs for both EV and Lévy models are symmetric. However, at higher coverage levels, confidence levels are asymmetric, with the upper bound moving further away from the mean of the estimates (bootstrapped). In contrast to the GP, it is difficult to establish clear-cut results for the four Lévy models through a comparison of the SEs of VaR estimates. Unlike GP, often at ultra-high coverage (0.999), the CIs exhibit ultra-spread, which indicates unstable forecasts at ultra-high coverage, presumably because the estimations are based on a few extreme observations above the threshold^{Footnote 7} alone.

Overall, the ES CIs are narrower than those of VaR, indicating that ES measures are more precisely estimated than VaRs. VaR and ES bootstrapped statistics (SE and CI) are informative regarding some differences and similarities between Lévy and EV approaches. VaR and ES bootstrapped statistics are narrower for Lévy models than for EV models, which indicates that the estimation with a Lévy approach is more stable than that with an EV approach. Moreover, the estimation performance deteriorates under both approaches with increased coverage. For NIG and GH models, estimation instability seems to propagate much faster, especially at higher coverage levels.

## VaR and ES backtesting

We conduct dynamic calibration on a rolling window for backtesting. As daily VaR(α) is estimated on daily return, the loss for the one-day holding of an asset can only be violated for 100% α of the time, allowing all possible extremities. We use an indicator variable describing the hit sequence that identifies the day of VaR violation in the following *T* trading days. The hit sequence is Bernoulli distributed with a probability α of assuming 1. We implement three VaR tests—unconditional, independence, and conditional coverage. We use two tests without distributional assumption for ES backtests—unconditional-normal and unconditional *t*-test.

### VaR backtesting

#### Unconditional test

The unconditional hypothesis of backtesting does not hold any assumption regarding today’s violation status when it provides statistical evidence as to whether the observed proportion of violation (PV) of a VaR model tomorrow is significantly different from its promised fraction α. However, the evidence is provided through an asymptotical test statistic following χ^{2} with one degree of freedom:

Here, *T* = *T*_{1} + *T*_{0} is assumed to be significantly large, and *T*_{0} and *T*_{1} are the number of days with no violation and days with a violation, respectively. We use the Monte Carlo simulation to calculate the *p*-value. We compute the Monte Carlo *p*-value to simulate 999 test values, \({\left\{LR\left(i\right)\right\}}_{i=1}^{999}\), each of which is based on a Bernoulli (α) sample of hit sequences having the same size as the original sample:

The simulated *p*-value is the proportion of simulated test values given that a simulated test value is more significant than the test value to roughly the number of simulations.

#### Independence test

The independence test checks whether VaR violations are truly random and not clustered over time. As assets with volatility clustering yield VaR exhibiting clustering, we can predict that if there is a violation today, then we will most likely find a violation tomorrow, which is more than 100% α likely. VaR adjusts to the predictions of high volatility as useful information, ensuring the risk model is correctly specified, and the violation of VaR remains unpredictable. The test statistic of the independence test is given as follows (Christoffersen 2003):

where the matrix of transitional probabilities of conditional violations is given as follows:

Thus, we can write

where *p* characterizes the matrix of transitional probabilities of violation, ensuring no dependence between 0 and 1 in the hit sequence:

*L*(*p*) is similar to the unconditional hypothesis. *LR*_{ind} provides the statistical significance of the likelihood of independence in the hit sequence over the likelihood of dependence.

#### Conditional coverage test

The conditional coverage test checks whether the average number of violations changes with the level of a risk model. The conditional coverage test statistic has a similar expression as the independence test statistic with \(p=\frac{{T}_{1}}{T}\) of the independence statistic replaced by the coverage level α of the risk model (Christoffersen 2003; Dowd 2005):

### ES backtesting

The unconditional coverage test statistic proposed by Acerbi et al. (2014) for ES backtesting is as follows:

where \({X}_{t}\) represents profit and loss distribution along a real but unknowable distribution and is forecasted by a model predictive distribution conditional to previous information used to compute ES, and \({I}_{t}\) is an indicator function, which is equal to 1 when the forecasted VaR is violated, that is, \({X}_{t}<-{VaR}_{\alpha ,t}\) and 0 otherwise.

We use two tests without distributional assumption—unconditional-normal test and unconditional-*t* test (see Acerbi et al. 2014; Acerbi et al. 2017). The unconditional-normal test assumes that *X*_{t} follows a standard normal distribution, whereas the unconditional-*t* test assumes that *X*_{t} follows a *t*-distribution. The unconditional test statistic is sensitive to the severity of the VaR failures relative to the ES estimate and the frequency of VaR failures. As a result, a rare but colossal VaR failure(s) relative to the ES may result in the rejection of a model over a particular timeframe.

However, when the ES estimate is large on a violation day, it may not impact the test results as much as a large loss would have if a smaller ES were encountered. Similarly, a model can be rejected due to many VaR failures even if all VaR violations are just slightly higher than the VaR as such failure contributes to making the test statistics negative. Thus, glimpses of asymmetry in several VaR violations among different models and asymmetry in expected and observed severity (severity ratio) are critical for ES backtesting.

### Backtesting results

We now examine the sensitivity of the risk measure VaR to new observations for dynamic calibration on a rolling window of four business years with a two-year look-back window and continue to increase the window length. To avoid the problem with EV dynamic calibration that considers only extreme observations, we increase the proportion of extreme observations by adjusting the threshold and expanding the length of the look-back window. We consider the extreme 30% observations in calibration dynamically for EV and the coverage levels of 95% and 99%.

First, we calibrate all the models on the time series of returns for 2007–2010 on December 31, 2010 and use the calibrated parameters to predict the VaR and ES for January 1, 2011. This gives us an additional new observation of returns on January 1, 1995. We remove the oldest observation to accommodate this new observation in our fixed length look-back window and then calibrate the models in a new window to predict the VaR and ES on January 2, 2011. The process continues until the end of 2003. Thus, the dynamic calibration starts on January 1, 2011 and ends on December 31, 2017. The unconditional, independence, and conditional coverage hypotheses are tested with 95% VaRs. The backtesting checks whether unconditional and conditional distributions influence conditional and unconditional coverage hypotheses tests. The backtesting results for long positions in all indices and the PVs are presented in Table 10.^{Footnote 8}

Table 10 reveals that the EV–VaR is not distinguishable from that of the full density-based Lévy–VaR given the observed PV performance of backtesting. The PVs corresponding to an EV model are closer to the promised fraction of violations for all indices, except for Nikkei225. The PVs corresponding to EV models for Nikkei225 deviate more from the promised fraction of violations than those for Lévy models. Thus, the tail-based risk measure of VaR obtained for the tail-based model of EV and full density-based Lévy models are almost similar.

For the remaining indices, the results from the hypotheses testing are mixed. As the VaR violations are clustered at 95%, the independence test fails in most cases. However, the test passes the 99% coverage. On the other hand, the conditional coverage hypothesis is supported in most cases, although both unconditional and independence hypotheses are not supported. A significant deviation of the observed PV from the promised PV for unconditional coverage may have contributed to the rejection of the conditional coverage hypothesis. We also report the Chi-square and the Monte Carlo simulated *p*-values, which test the effectiveness of the test statistics. We find that both the Chi-square and *p*-values are close to each other, implying that our tests are relevant.

The last two columns in each table report the results of the ES backtesting. We report two ES backtesting results without distributional assumption—the unconditional-normal test (unconditional-*N*) and the unconditional-*t* test (unconditional-*t*). The *p*-values of the tests, which represent the success rate when multiplied by 100, are reported in parentheses. We identify each test as a “pass” (P) or a “fail” (F) based on the *p*-values in the table. All tests are conducted at a 95% confidence level. None of the unconditional-*N* tests passed. However, the unconditional-*t* tests passed in only a few instances. Thus, our results do not suggest any preference for the EV or Lévy models through ES backtesting.

## Discussion

We have investigated four full-density Lévy models and estimated the tail-focused risk measure VaR and its coherent version ES, in addition to estimating VaR and ES for an EV model (a tail targeting approach). The parameters calibrated for all five models under both approaches are presented in Table 2 and 3. VaR and ES are based on high coverage levels, accounting for extreme events governing high trading losses. We analyze the performance of VaR and ES risk measures, utilizing full-density Lévy models of VG, NIG, HYP, and GH, and compare them with the VaR and ES estimates obtained with the tail density-based EV model. The results reveal that it is very difficult to ascertain any comprehensive superiority of one approach over the other.

Table 11 presents the frequency distribution of significant estimates reported in Table 5, 6, 7, 8 and 9. We have 15 estimates of VaR and 15 estimates of ES risk under the risk model EV and its contender Lévy estimated across all five indices and three coverage levels. Generally (11 out of 15 estimates), we find that the Lévy–VaR forecasts are closer to respective empirical estimates than their EV counterparts. Nevertheless, such observation does not allow us to declare inadequacy to disfavor the EV approach. However, in the Lévy category, the NIG (4 out of 11) and the GH (6 out of 11) models provide much more appreciable forecasts (in the sense of having minimum absolute deviation with empirical estimates) of VaR compared with the VG and the HYP Lévy models. Among the remaining Lévy models, VaR forecasts favor the HYP model, supporting the derivative pricing concept (Schouten 2003). This implies that a fully flexible GH model forecasts the quantiles more befittingly than its restricted versions.

Regarding the restricted versions, NIG characterization seems to have minimal effect on forecasts due to restriction. Regarding 15 ES forecasts, the EV model counts for the 8 most favorable forecasts, which is not sufficient to be deemed adequate. Looking into odds for Lévy-ES forecasts, we find 3 out of 7 for NIG, 2 out of 7 for GH, 1 out of 7 for the VG, and 1 out of 7 for the HYP model. Thus, the presumed myth that a tail-targeting model is more likely to provide superior, consistent forecasts for tail-focused risk measures VaR and ES is empirically confronted. The simplicity of EV–VaR and EV-ES is attractive, but a comparison with empirical values often disputes their adequacy.

We find some randomness in classifying the superiority of an approach over another given a specific timeframe. The frequency distribution of adequacy between the simple tail targeting EV analytic risk model and the relatively complex full density driven Lévy risk models is presented in Table 11. While it is difficult to claim that a particular full-density Lévy model is superior irrespective of data ranges, it is impossible to claim that the tail-targeting EV model has any sense of adequacy irrespective of all data ranges.

Given such similarity of forecast performances in both approaches, the choice is likely to be determined by a compromise between user-define-simplicity and user-perceived-adequacy. This suggests that the performances of risk measure VaR and its coherent version ES are different and fail to adequately identify the risk profiles of assets. When the VaR and ES models identify the risk profiles of assets similarly, both VaR and ES would be adequate for tail targeting either the EV model or some full-density Lévy model. However, the performance of VaR and ES are mixed across EV and Lévy models. This should not be interpreted that the test statistics results of model fitting performances are contradictory.

It is well known that the AD test is tail-emphasized. Therefore, quantile mismatch outside the tail is barely detected by the *AD*_{ev} test, which is applied to the tail-targeting EV model. Based on the AD test statistics in Table 3, the EV model has some preference over the Lévy models. However, this is only based on the tail quantile match of EV. This means that it hardly bears information on quantile matches far outside the tail. This is why an AD test value of a solely tail-based EV model might turn deceptive when compared with the AD test value of an entire distribution-based Lévy model. This deception can hardly be adequately detected by applying the GOF test emphasized on the tail. Thus, it is not surprising that the seemingly preferable EV model turns elusive and does not yield the most adequate forecasts of risk measures, i.e., VaR and ES. Our backtesting results about VaR and ES also confirm such findings. The EV–VaR or EV-ES results are not significantly different from VaR and ES based on Lévy models. In most cases, the results are mixed.

## Conclusion

We investigate and compare the simplicity and adequacy of tail-focused VaR and ES risk measures for tail-targeting EV models with the full density-focused Lévy–VaR and Lévy-ES using data on futures contracts of S&P500, FTSE100, DAX, Hang Seng, and Nikkei 225 indices from January 1, 2007 to December 31, 2017, covering the 2007–2008 Global Financial Crisis that led to the subprime mortgage debacle in the US. We find that returns discarded by the EV model (as they do not characterize the extreme unexpected market losses) and incorporated by the Lévy models do have some effect on the performance of tail-focused risk measures VaR and its coherent version ES. Thus, without an immutable law justifying any preference between “tail-alone” and “full-density-based” models for tail-focused risk management, this study provides a heuristic analysis illustrating the potential effects of the observations discarded by an EV model on risk estimates when considered under Lévy models.

The tail-based EV models are simpler for the analytic formulation of the tail-focused risk measures of VaR and tail-aggregate risk measure of ES compared with the Lévy-based measure. Moreover, the EV models are simpler to implement in risk measure calculations. However, we find that the EV models are inadequate as the performance of EV risk estimates is not necessarily superior to that of Lévy risk estimates. On the other hand, we cannot assure that a full density-based model based on Lévy distribution adequately assesses risk measures. Thus, the adequacy of a simpler model with a more straightforward implementation becomes a relative consideration. Our model testing reinforces the theoretical fundamental that the extreme observations in the tails of the EV model (discarding smaller systematic returns) and all observations, including smaller and extreme ones in Lévy models, are different approaches with the common goal of meaningful simplification of reality. Their relative performance for a particular time window may fail to offer any guarantee. Given such randomness of estimation performances under both approaches (for different ranges of data and coverage levels), the choice should be determined by a compromise or trade-off between simplicity and user-defined adequacy. Our study period covers the 2007–2008 Global Financial Crisis. The analysis can be extended to other financial crisis periods, e.g., the Russian financial crisis, the default crisis related to the Long-Term Capital Management of 1998,^{Footnote 9} and more recent the coronavirus-related turmoil in the financial market during 2020–2021.

Our analysis is based on only a selection of EV and Levy models. The study can be extended using other types of models. Practitioners should not rely on one set of desired models and ignore others when implementing VaR and ES estimation. The simplicity of a model does not guarantee its adequacy. However, the adequacy of a model based on “full-density,” e.g., the Lévy-based model, may not always be the best when a simpler tail-based model will provide more robust VaR and ES results. As many banks and financial institutions do not follow adequacy requirements in risk measures following the Basel agreement, our findings shed empirical light on such complexity. Therefore, when the results are mixed, banks and financial institutions as well as policymakers should find a way to compromise simplicity and adequacy.

## Availability of data and materials

The data will be available from the authors upon request.

## Notes

The quote is from Albert Einstein’s Herbert Spencer Lecture delivered at Oxford (June 10, 1933) (On the Method of Theoretical Physics, published in Philosophy of Science, Vol. 1, No. 2 (April 1934), pp. 163–169., p. 165).

VG, NIG, and HYP are versions of a GH model with some or other parameters restricted. Thus, to observe the effect of full flexibility and selected restrictions for this family of processes, in addition to the restricted models, we include the GH model.

Every member of the GH family has sign-of-skewness explicitly characterized by the sign of a single parameter of the model. That is why we consider GH family in our context.

The difference in tail masses for GH members, with respect to long and short positions under the same cut-off point, is simply due to the sign change of the skewness parameter.

Other popular GOF tests such as a Chi-square test are not comparable for Lévy models on complete data and left truncated version for EV model with incomplete data.

We also conducted VaR and ES estimation for sample periods with or without the 2008 Global Financial Crisis period. The results do not change significantly. For brevity, we do not report the results here.

The unconditional, independence, and conditional coverage hypotheses are also tested with 99% VaRs. The results are similar to the 95% VaR.

## References

Acerbi C, Szekely B (2014) Backtesting expected shortfall. MSCI Inc., New York

Acerbi C, Szekely B (2017) General properties of backtestable statistics. SSRN Electron J. https://doi.org/10.2139/ssrn.2905109

Anderson TW, Darling DA (1952) Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Ann Math Stat 193–212

Anderson TW, Darling DA (1954) A test of goodness of fit. J Amer Stat Assoc 49(268):765–769

Anna C, Rachev S, Fabozzi F (2005) Composite goodness-of-fit tests for left-truncated loss samples. Working Paper, Department of Statistics and Applied Probability. University of California, Santa Barbara

Bali TG (2007) A generalized extreme value approach to financial risk measurement. J Money Credit Bank 39(7):1613–1649

Bank for International Settlements (2019) Basel Framework (Basel), Accessed September 2021, http://www.bis.org/basel_framework/index.htm?export=pdf

Barndorff-Nielsen OE (1995) Normal Inverse Gaussian distributions and the modeling of stock returns. Research report no 300, Department of Theoretical Statistics, Aarhus University 401–419

Barndorff-Nielson O (1977) Exponentially decreasing distributions for the logarithm of particle size. Proc R Soc Lond A 353:401–419

Barndorff-Nielson O (1978) Hyperbolic distributions and distributions on hyperbolae. Scand J Stat 5:151–157

Bayer S, Dimitriadis T (2022) Regression-based expected shortfall backtesting. J Financ Econ 20(3):437–471

Bertoin J (1996) Levy processes. Cambridge University Press, Cambridge

Bingham N H, Kiesel R (2001) Modeling asset returns with hyperbolic distributions. Return Distributions on finance 1–20. Butterworth-Heinemann.

Chavez-Demoulin V, Paul E, Sylvain S (2014) Extreme-quantile tracking for financial time series. J Econ 181(1):44–52

Cheng L, AghaKouchak A, Gilleland E, Katz RW (2014) Non-stationary extreme value analysis in a changing climate. Clim Change 127:353–369

Cheng J, Hong Y, Tao J (2015) How do risk attitudes of clearing firms matter for managing default exposure in futures markets? Eur J Financ 22(10):909–940

Christoffersen P (2003) Element of financial risk management. Academic Press, Cambridge

Cotter J, Dowd K (2006) Extreme spectral risk measures: an application to futures clearinghouse margin requirements. J Bank Finance 30:3469–3485

De Olivera F, Fajardo J, Mordecki E (2018) Skewed Lévy models and implied volatility skew. Int J Theor Appl Finance 21(02):1850003

Dowd K (2005) Measuring market risk. John Wiley & Sons Ltd, Hoboken

Du Z, Escanciano JC (2017) Backtesting expected shortfall: accounting for tail risk. Manage Sci 63(4):940–958

Eberlein E, Hammerstein EA (2002) The generalized hyperbolic and inverse Gaussian distributions: limiting cases and approximation of processes. FDM preprint 80, University of Freiburg.

Eberlein E, Prause K (1998) The generalized hyperbolic model: financial derivatives and risk measures. FDM preprint 56, University of Freiburg

Eberlein E, Keller U (1995) Hyperbolic distributions in finance. Bernoulli 1(3):281–299

Fajardo J (2015) Barrier style contracts under Lévy processes: an alternative approach. J Bank Finance 53:179–187

Fajardo J, Mordecki E (2006) Symmetry and duality in Lévy markets. Quant Finance 6(3):219–227

Fajardo J, Mordecki E (2014) Skewness premium with Lévy processes. Quant Finance 14(9):1619–1626

Farkas W, Mathys L (2022) Geometric step options and Lévy models: duality, PIDEs, and semi-analytical pricing. Front Math Finance 1(1):1–51

Frésard L, Perignon C, Wilhelmsson A (2011) The pernicious effects of contaminated data in risk management. J Bank Finance 35(10):2569–2583

Fuse G, Meucci A (2008) Pricing discretely monitored Asian options under Lévy processes. J Bank Finance 32(10):2076–2088

Gençay R, Selçuk F (2004) Extreme value theory and value-at-risk: relative performance in emerging markets. Int J Forecast 20(2):287–303

German H (2002) Pure jump Lévy processes for asset price modeling. J Bank Finance 26(7):1297–1316

Hoga Y, Demetrescu M (2022) Monitoring value-at-risk and expected shortfall forecasts. Manag Sci. https://doi.org/10.1287/mnsc.2022.4460

Jondeau E, Rockinger M (2003) The tail behavior of stock returns: emerging versus mature markets. J Empir Financ 10:559–581

Kabir MH, Hassan MK (2005) The near-collapse of LTCM, US financial stock returns, and the fed. J Bank Finance 29(2):441–460

Kabir MH, Hassan MK (2009) Russian financial crisis, US financial stock returns and the IMF. Appl Financ Econ 19(5):409–426

Kim YS, Rachev ST, Bianchi ML, Fabozzi FJ (2008) Financial market models with Lévy processes and time-varying volatility. J Bank Finance 32(7):1363–1378

Kourouma L, Dupre D, Sanfilippo G, Taramasco O (2010) Extreme value at risk and expected shortfall during the financial crisis. Available at SSRN 1744091

Kyprianou A (2006) Introductory lectures on fluctuations of levy processes with applications. Springer, Berlin

Lazar E, Zhang N (2019) Model risk of expected shortfall. J Bank Finance 1(105):74–93

Longin FM (1996) The asymptotic distribution of extreme stock market returns. J Bus 69(3):383–408

McNeil AJ, Frey R (2000) Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. J Empir Financ 7:271–300

Monier E, Gao X (2015) Climate change impacts on extreme events in the United States: an uncertainty analysis. Clim Change 131(1):67–81

Mozumder S, Dempsey M, Kabir MH (2017) Backtesting extreme value and Lévy value-at-risk models: evidence from international futures markets. J Risk Finance 18(1):88–118

Otto S, Breitung J (2022) Backward CUSUM for testing and monitoring structural change with an application to COVID-19 pandemic data. Econom Theory. https://doi.org/10.1017/S0266466622000159

Patton AJ, Ziegel JF, Chen R (2019) Dynamic semiparametric models for expected shortfall (and value-at-risk). J Econom 211(2):388–413

Perignon C, Smith DR (2010a) The level and quality of value-at-risk disclosure by commercial banks. J Bank Finance 34(2):362–377

Perignon C, Smith DR (2010b) Diversification and value-at-risk. J Bank Finance 34(1):55–66

Pidgeon N (2012) Climate change risk perception and communication: addressing a critical moment? Risk Anal 32:951–956

Prause K (1999) The Generalized Hyperbolic Model: Estimation, financial derivatives and risk measures. Ph.D. Thesis, University of Freiburg

Quintos C, Zhenhong F, Peter CBP (2001) Structural change tests in tail behaviour and the Asian crisis. Rev Econ Studies 68(3):633–663

Sato K (1999) Levy processes and infinitely divisible distributions. Cambridge University Press, Cambridge

Schoutens W (2003) Lévy processes in finance: pricing financial derivatives. John Wiley & Sons Ltd, Hoboken

So MKP, Wong CM (2012) Estimation of multiple period expected shortfall and median shortfall for risk management. Quant Finance 12(5):739–754

Sorwar G, Dowd K (2010) Estimating financial risk measures for options. J Bank Finance 34(8):1982–1992

Taylor JW (2019) Forecasting value at risk and expected shortfall using a semiparametric approach based on the asymmetric Laplace distribution. J Bus Econ Stat 37(1):121–133

Tolikas K, Gettinby GD (2009) Modelling the distribution of the extreme share returns in Singapore. J Empir Financ 16(2):254–263

Wong HY, Guan P (2011) An FFT-network for Levy option pricing. J Bank Finance 35(4):988–999

## Acknowledgements

None

## Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

## Author information

### Authors and Affiliations

### Contributions

All authors contributed to the study conception and design. All authors read and approved the final manuscript. SM was involved in concept development. MKH developed the idea further and was involved in editing. SM wrote the first draft. MKH and HK further revised the draft.

### Corresponding author

## Ethics declarations

### Competing interests

The authors have no relevant financial or non-financial interests to disclose.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Mozumder, S., Hassan, M.K. & Kabir, M.H. An evaluation of the adequacy of Lévy and extreme value tail risk estimates.
*Financ Innov* **10**, 100 (2024). https://doi.org/10.1186/s40854-024-00614-6

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40854-024-00614-6