A note on calculating expected shortfall for discrete time stochastic volatility models

In this paper we consider the problem of estimating expected shortfall (ES) for discrete time stochastic volatility (SV) models. Specifically, we develop Monte Carlo methods to evaluate ES for a variety of commonly used SV models. This includes both models where the innovations are independent of the volatility and where there is dependence. This dependence aims to capture the well-known leverage effect. The performance of our Monte Carlo methods is analyzed through simulations and empirical analyses of four major US indices.


Introduction
Empirical studies consistently show that financial returns do not have a constant volatility and instead exhibit volatility clustering. This clustering is often modeled using GARCH or one of its variants. Perhaps the most prominent alternative to GARCH is the class of discrete time stochastic volatility (SV) models. These models are very flexible and capture additional stylized features of financial returns including skewness, excess kurtosis, and leverage effects (Cont and Tankov 2004). SV models are often credited to Taylor (1986) although they have a long prehistory. See Shephard (2005) or Taylor (1994) for a thorough review.
A lot of research effort has focused on option pricing for SV models, much less attention has been paid to the problem of calculating risk. However, as we will see, this problem is not trivial even in the case when all of the parameters are explicitly known. In this paper, we introduce a Monte Carlo method for calculating expected shortfall (ES) for several important classes of SV models. ES is one of the best known and most commonly used measures of financial risk. It is, arguably, second in popularity only to Value-at-Risk (VaR). However, unlike VaR, ES is a coherent risk measure (Artzner et al. 1999) and it has been chosen to replace VaR as the measure determining a bank's capital requirements in the Basel III regulatory framework (Basel Committee on Banking Supervision 2013). For more information on VaR, ES, and related risk measures, see e.g. McNeil et al. (2015) and the references therein. A well-known survey on the estimation of ES is given in Nadarajah et al. (2014). Among a long list of methodologies, that paper discusses the estimation of ES under a GARCH model. However, we have not seen a discussion of ES estimation under an SV model in the literature.
The difficulty in calculating ES for SV models lies in the fact that one needs to work with the product of two random variables and, even in the case where both terms in the product have simple distributions, the distribution of the product may be quite complicated. This is in contrast with GARCH models, where the problem of evaluating ES, essentially, reduces to that of calculating ES for the distribution of the innovations.
The rest of this paper is organized as follows. In Sect. 2 we formally define the SV model and give a simple Monte Carlo method for evaluating ES in this case. In Sect. 3 we give a more sophisticated Monte Carlo method in the commonly used case where the innovations for the returns and for the volatility are independent. In Sect. 4 we give a similar method for an important case with dependence, which aims to model leverage. In Sect. 5 we illustrate our methodology on four major US indices. Some conclusions are given in Sect. 6.

Stochastic volatility
Discrete time stochastic volatility models commonly assume that the financial (log) return, at time t, is given by where is the log variance. Here σ > 0 , |φ| < 1 , and µ ∈ R are parameters, and {ǫ t } and {η t } are sequences of independent and identically distributed (iid) random variables representing the innovations for r t and h t , respectively. We do not, in general, assume that for a given t, ǫ t and η t are independent of each other. Note that the log variance is modeled by an AR(1) process. The assumption that |φ| < 1 ensures that this process is weakly stationary, see Ruppert and Matteson (2015). Under general conditions on the distributions of the innovations, this model can be seen as a discretization of a continuous time SV model where the log variance is modeled by a process of Ornstein-Uhlenbeck type, see Taylor (1994) or Barndorff-Nielsen and Shephard (2001). We are interested in evaluating the ES for this model.
We begin by establishing some notation. Let F t−1 denote the information set available at time t − 1 . For simplicity, we sometimes write P t−1 to denote the conditional probability P t−1 (·) = P(·|F t−1 ) and E t−1 to denote the conditional expectation E t−1 (·) = E(·|F t−1 ) . For τ ∈ (0, 1) , the τ th VaR at time t, denoted by VaR τ (t) , is the smallest number for which P t−1 {r t < −VaR τ (t)} ≤ τ . Note that −VaR τ (t) is the τ th conditional (given F t−1 ) quantile of r t . For this reason, we sometimes write Q τ (r t |F t−1 ) for −VaR τ (t) . The τ th ES at time t, denoted by ES τ (t) , is defined by when the integral exists, and is undefined otherwise. The parameter τ is typically chosen to be a small number such as 0.01, 0.025, or 0.05. Throughout, we assume 1. that the distribution of r t is continuous, and 2. that it satisfies The second assumption ensures that ES τ (t) is well defined, while the first allows us to use the more explicit formula Here and throughout, we write 1{·} to denote the indicator function.
Using the fact that the innovations are independent over time, together with basic properties of quantiles and expectations, we can write where a = Q τ e σ Y /2 Z is the τ th (unconditional) quantile of the random variable e σ Y /2 Z , and the joint distribution of (Y, Z) is the same as the joint distribution of (η t , ǫ t ) . The difficulty in evaluating M is that we must work with the distribution of X = e σ Y /2 Z , which can be complicated even when the distributions of Y and Z are fairly simple. Little is known about the distribution of X even in the case where Y and Z are both standard normal random variables, see Yang (2008) and the references therein. For this reason, we develop Monte Carlo methods to approximate M(τ , σ ).
We begin by approximating a = Q τ (e σ Y /2 Z) . Toward this end, fix some large integer N 1 and simulate an iid sequence of bivariate random variables {(Y i , Z i )} N 1 i=1 from the joint distribution of (η t , ǫ t ) . Next, for i = 1, 2, . . . , N 1 , set X i = e σ Y i /2 Z i . Now sort these from smallest to largest to get X (1) ≤ X (2) ≤ · · · ≤ X (N 1 ) . Finally, approximate a = Q τ (e σ Y /2 Z) by where ⌊·⌋ is the floor function. One can also use a smooth approximation using kernel estimators, see e.g. Sheather and Marron (1990). However, we did not find much of an improvement when using these. Next, fix another large integer N 2 and simulate a new iid sequence {(Y i , Z i )} N 2 i=1 from the joint distribution of (η t , ǫ t ) and approximate M(τ , σ ) by We note that, in principle one can use the same dataset to evaluate a and M(τ , σ ) although for smaller sample sizes this may create bias. Either way, the difficulty with this approach is that approximately (1 − τ )100% of the simulated values will not satisfy the condition in the indicator function in (7) and will thus be thrown out. As such, very few values will actually be used in the sum. For this reason, we may need N 2 to be an extremely large number to get a reasonable approximation. One could try to implement an importance sampling or related modification, but the fact that we are working with the product of two random variables, makes it difficult to use such an approach. Instead, we use the specific structure of this problem to implement an approach that works better in several important situations.

Independent case
It is commonly assumed that the sequences {ǫ t } and {η t } are mutually independent. For simplicity and to ensure that the distribution of the returns is continuous, we assume that the distributions of ǫ t and η t are both continuous, having probability density functions (pdfs) f ǫ and f η , respectively. In order to guarantee that (3) holds, we must assume that By a conditioning argument, we have where This can be used to develop a Monte Carlo method for approximating M(τ , σ ) as follows. Fix some large integer N 1 and simulate two mutually independent sequences of iid random variables Use these to approximate a = Q τ (e σ Y /2 Z) by a as in (6). Now choose another large integer N 2 and simulate Y 1 , . . . , Y N 2 iid from f η . We can then approximate M(τ , σ ) by We now give explicit formulas for H in several important situations. Throughout we assume that a ≤ 0 , which holds for all reasonable choices of τ . Perhaps the most common assumptions are that f ǫ is the pdf of a standard normal distribution or a t-distribution. In the standard normal case we have and in the case of a t-distribution with ν > 1 degrees of freedom we have In the above, we need ν > 1 as otherwise (8) will not hold. In practice, it is often assumed that the distributions of returns are skewed. To capture this, skewed modifications of normal and t-distributions are often used. While there are a number of ways to introduce such modifications, we follow the approach of Fernandez and Steel (1998). In general, this approach can be described as follows. If f 1 is the pdf of a distribution that is unimodal and symmetric around zero, then for γ > 0 is a skewed modification of f 1 . The parameter γ determines the skew of the distribution. When γ = 1 the distribution is symmetric, when γ < 1 it has a negative skewness, and when γ > 1 it has a positive skewness. Using change of variables, it is straightforward to show that, if H 1 corresponds to f 1 , then corresponds to f γ . We can easily apply this to get explicit formulas for H in the cases of skewed modifications of normal and t-distributions. We now give a small simulation study to compare the performance of M 1 (τ , σ ) and M 2 (τ , σ ) . For these simulations we assume that η t has a standard normal distribution, while for ǫ t we consider two distributions: standard normal and student-t. The values of the parameters, σ and (in the case of the student-t distribution) ν , were calibrated according to the daily returns from January 2014 to December 2019 of the S&P 500 Index. This was done using the stochvol package for the statistical software R, see Kastner (2016). We also performed similar simulations where the parameters were calibrated to the daily returns over the same period from the Russell 2000 Index, the Dow Jones Industrial Average, and the NASDAQ Composite Index. However, the results were similar and are not presented in the interest of space. Since our goal is to compare the two methods for evaluating M(τ , σ ) , we do not want issues with calculating a to interfere with the comparison. For this reason we choose N 1 = 3 * 10 7 to be a large value and use the same value of a for all simulations with the same distribution. For N 2 we consider a range of values from 100 to 5000 in increments of 100. For each value of N 2 , we estimate M 1 (τ , σ ) and M 2 (τ , σ ) 1000 times and report the standard deviations and the means over these trials. For τ = 0.01 , the results are presented in Fig. 1. From the plots, we can see that the second method has significantly less variance and that the mean gets close to the true value much quicker. We also repeated the procedure for τ = 0.025 and 0.05, but the results were similar and are thus omitted. We note that we cannot allow η t to have a t-distribution, as this would violate assumption (8).

Model with leverage
In the literature of financial returns, the leverage effect is the empirically observed phenomenon that volatility tends to be negatively correlated with returns, see e.g. Cont and Tankov (2004). In the important case where η t and ǫ t are jointly Gaussian, leverage is often modeled by assuming that the joint distribution of the random vector (η t , ǫ t ) follows a bivariate normal distribution N (0, �) where the covariance matrix is for some ρ ∈ (−1, 1) , see Omori et al. (2007). When ρ < 0 , the volatility is negatively correlated with the return, which captures the leverage effect. From properties of multivariate normal distributions, it follows that, in this case, where W 1 , W 2 are iid N(0, 1) random variables and d = denotes equality in distribution. There does not seem to be a standard way to model leverage in the non-Gaussian case. However, one approach is suggested, in a continuous time setting, by Eq. (8) in  (10) Barndorff-Nielsen and Shephard (2001). The idea is to add a constant times the innovation of the log volatility to the model for r t . A variant of this idea, which is consistent with how the Gaussian case is treated, is to consider the model where h t is as in (2) and {δ t } and {η t } are mutually independent sequences of iid random variables. The new parameter ρ ∈ (−1, 1) determines the dependence between the return and the volatility. When ρ = 0 , the model reduces to the independent case. Note that this is equivalent to taking in (1). For simplicity we assume that the distribution of δ 1 has pdf f δ .
We now give an approach for evaluating M(τ , σ ) . In this case, we can write Z d =ρW 1 + 1 − ρ 2 W 2 and Y d =W 1 , where W 1 , W 2 are independent random variables with W 1 ∼ f η and W 2 ∼ f δ . It follows that where and Here is the cumulative distribution function (cdf ) of the distribution of δ t and r t = e h t /2 1 − ρ 2 δ t + ρη t , In the case where the distribution of δ t is standard normal and b ≤ 0 , G δ (b) = −ϕ(b) where ϕ(x) = e −x 2 /2 / √ 2π is the pdf of the standard normal distribution. The above suggests the following Monte Carlo method. First, we approximate a by a as in (6). Next, choose a large integer N 2 and simulate W 1 , . . . , W N 2 iid from f η . We can then approximate M(τ , σ ) by We again perform a small simulation study to compare the performance of M 1 (τ , σ ) and M 2 (τ , σ ) . For these simulations we assume that δ t and η t are independent standard normal random variables, or equivalently that (η t , ǫ t ) ∼ N (0, �) , where is given by (11). The values of σ and ρ are calibrated according to the daily returns from January 2014 to December 2019 of the S&P 500 Index. This was again done using the stochvol package for R. For simulations we again took N 1 = 3 * 10 7 and N 2 from 100 to 5000 in increments of 100 and repeated each simulation over 1000 iterations. The results for τ = 0.01 are given in Fig. 2. We again see that the second method has less variance and that the mean gets close to the true value quicker. However, the differences are not as strong in this case. As before, we also considered the case where the parameters were calibrated to the other three indices and when τ = 0.025 and 0.05. Those results were similar and are not presented here in the interest of space.

Data analysis
In this section we perform a data analysis, where we estimate ES using SV models for four major US indices: the S&P 500 Index, the Russell 2000 Index, the Dow Jones Industrial Average, and the NASDAQ Composite Index. In all cases we use daily returns from For the analysis, we used the first observations as historical data and the last 500 observations for one-step ahead ES forecasts. The SV models that we consider are: 1. (SV-N ) ǫ t and η t are iid N(0, 1); 2. (SV-t 2 ) ǫ t and η t are independent with ǫ t ∼ t 2 and η t ∼ N (0, 1); 3. (SV-t cal) ǫ t and η t are independent with ǫ t ∼ t ν , where ν is calibrated to the data, and η t ∼ N (0, 1); 4. (SV-lev) (η t , ǫ t ) ∼ N (0, �) , where is given by (11).
Models 1-3 assume that ǫ t and η t are independent, while Model 4 takes leverage into account.
The data analysis is performed as follows. We fix an index, an SV model, and a value of τ . For each of the last 500 observations, we use the stochvol package to calibrate the parameters of the SV model based on all of the observations before this one. The package gives multiple estimates for each parameter, and we take the mean of these as our estimate. We then use these parameter values to estimate ES τ using (4), where we evaluate M using M 2 for the appropriate SV model. In all cases we take N 1 = N 2 = 5000 . Note that the parameter values are recalibrated for each observation. In the interest of space we only report the results for τ = 0.01 , although we also repeated the procedure for τ = 0.025 and 0.05. Figures 3, 4, 5 and 6 present the results for the four SV models, respectively. In each plot, we give the time series of the 500 data points, with the estimated −ES τ overlaid. These values of −ES τ are generally below the data, suggesting they do a good job of capturing risk. In Fig. 4 values of the time series, suggesting that the tails of this model are too heavy and that we should use a larger value for the degrees of freedom. This is done in Fig. 5, where the degrees of freedom are calibrated to the data. These calibrated values are all in the range from 20 to 40 degrees of freedom.
For details see Nadarajah et al. (2014) or Christou and Grabchak (2021). We note that (2) is a special case of the QGARCH(1,1) method, where we take h = 1 , µ = 0 , and estimate the standard deviation using a GARCH(1,1) model, and that (3) is the filtered historical method based on a GARCH(1,1) filter. We compare the performance of the proposed SV methods with these benchmark methods using backtesting.
There are many backtests for ES available in the literature, see Lazar and Zhang (2019) or Deng and Qiu (2021) for an overview. Many popular approaches, such as those in Du and Escanciano (2017), make parametric or semiparametric assumptions. Since we are comparing models that make a variety of different assumptions, we choose two backtests that make no such assumptions. Before giving these, we define some notation. Fix a model and assume that for each time period t = 1, 2, . . . , T we use this model to estimate VaR τ (t) by VaR τ (t) and ES τ (t) by ES τ (t) . Our first backtest is from Acerbi and Székely (2014) and is based on If we evaluate Z for several models, then the one where Z has the smallest absolute value is considered to be the best. An issue with this method is that it is sensitive to the estimate of VaR . Since different methods estimate VaR differently, this can make the results not fully comparable. The second backtest does not require estimating VaR . It is from Embrechts et al. (2005) and is based on As in the previous case, the method where V has the smallest absolute value is considered to be the best. The results of these two backtests are given in Tables 1 and 2. We begin by considering the results of the first backtest. The best method is clearly DFGARCH, which has a significantly better performance than the other methods. At first glance SV-t 2 appears to be one of the better models. However, this is likely an artifact of the way that statistic Z works. It is unbounded in the negative direction, but bounded by 1 in the positive direction, which implies that it penalizes underestimation of ES more than it penalizes overestimation. We have already seen that SV-t 2 drastically overestimates ES , thus we cannot take its performance on this backtest as evidence that it works well. Aside from this, the best performing SV model is clearly SV-lev. It beats GARCH in all cases except for 5% with Russell and NASDAQ, where it performs slightly worse. Further, it is generally comparable to, although slightly worse than, Hist. We now turn to the second backtest. This method does not seem to have an asymmetry in penalizing underestimation and overestimation of ES . We can now clearly see that SV-t 2 is the worst method. DFGARCH is again the best method. Here it is clear that SV-lev is second best, while Hist is a close third.

Conclusions
In this paper we considered the problem of estimating ES for SV models. To the best of our knowledge, this is the first paper to deal with this topic. We introduced two Monte Carlo methods, which are easy to implement in many common situations and can be used in both the case where the volatility is independent of the innovation and where there is dependence. This dependence aims to capture the leverage effect. Our simulations suggest that the second method has a lower variance and converges faster. As such it is the method that we suggest using. The other method is primarily introduced as it is more straightforward and can thus serve as a benchmark. We evaluated several variants of our method on real-world data and compared the results with three benchmark methods. We saw, in particular, that the SV model with leverage performed very well in backtests, although it was not the best. We note that we only considered a few simple distributional assumptions in the data analysis and that our methodology works with many other distributions. We leave the question of which distributions are the best to use in conjunction with an SV model for future work. We now discuss a simple extension of our work. Thus far we have only considered SV models, where the volatility follows an AR(1) process as defined in (2). However, nothing in our approach depends on this structure and we can replace (2) by where g t−1 is any random variable that is measurable with respect to F t−1 and is independent of (η t , ǫ t ) . In this case we have where M(τ , σ ) is as in (5). Thus, our approach easily extends to more complicated time series structures.
We conclude by noting that in both this more general SV model and the one that we considered earlier, the main thing that needs to be evaluated is M, which only depends on τ , σ , and the joint distribution of (η t , ǫ t ) . In particular, it does not depend on g t−1 , which is only needed for a simple multiplicative term. Thus, if all parameters of the joint distribution of ǫ t and η t are known ahead of time, one can precompute the values of M(τ , σ ) for the appropriate choice of τ on a grid of σ values. These can then be interpolated to get values very quickly. In the case where τ = 0.01 and ǫ t and η t are independent standard normal random variables, a plot with this information is given in Fig. 7. The information in this plot, along with estimates of g t−1 , is all that is needed to evaluate ES in this case. h t = g t−1 + σ η t ES τ (t) = −e g t−1 /2 M(τ , σ ),