Stressed portfolio optimization with semiparametric method

Tail risk is a classic topic in stressed portfolio optimization to treat unprecedented risks, while the traditional mean–variance approach may fail to perform well. This study proposes an innovative semiparametric method consisting of two modeling components: the nonparametric estimation and copula method for each marginal distribution of the portfolio and their joint distribution, respectively. We then focus on the optimal weights of the stressed portfolio and its optimal scale beyond the Gaussian restriction. Empirical studies include statistical estimation for the semiparametric method, risk measure minimization for optimal weights, and value measure maximization for the optimal scale to enlarge the investment. From the outputs of short-term and long-term data analysis, optimal stressed portfolios demonstrate the advantages of model flexibility to account for tail risk over the traditional mean–variance method.


Introduction
Several historical episodes, such as the financial crisis and COVID-19, have posed new challenges for investment management in unknown and unprecedented tail risks. A large body of literature on econometric research exploits the validation of various financial models and risk measures, such as value-at-risk (VaR) and conditional value at risk (CVaR) for risk management (Jorion 2007). We extend the use of these risk measures (Artzner et al. 1999) for portfolio optimization using a novel semiparametric modeling method under stressed scenarios. The scaling effect of stressed portfolios is also a concern. Risk-sensitive value measures (Miyahara 2010) were adopted to maximize the optimal scale for a given portfolio strategy.
The proposed semiparametric modeling method is constructive and consists of two estimation procedures: the nonparametric kernel method for marginal distributions and a parametric copula method for their joint distribution. This semiparametric method builds up a more complex dependence between portfolio constituents than traditional Gaussian models that can be used to exploit tail risks.
From both experimental and theoretical perspectives, we find that the proposed optimal stressed portfolio and the semiparametric method perform better than Markowitz's mean-variance method (Markowitz 1952). From an experimental perspective, our implementation of the stressed portfolio optimization relies on a rolling window approach and checks its robustness. In addition, from a theoretical perspective, the risksensitive value measure (RSVM) is equipped with more properties for general heavytail distribution than Markowitz's mean-variance model, thus making mean-variance a special case in the risk-sensitive value measure.
The remainder of this paper is organized as follows: "Literature review" section provides a literature review, particularly on the nonparametric kernel method and the parametric copula method. "The semiparametric method" section generates non-Gaussian distributed portfolios using the proposed semiparametric method with two parts. First, we construct the marginal distribution of each constituent asset by nonparametric estimation with cross-validation to obtain the optimal bandwidth of a kernel function and its perturbation analysis. The alternative part estimates the parameters of copula functions by full maximum likelihood estimation (MLE). "Stressed portfolio optimization and its scaling effect" section solves the optimal weights of the portfolio using the semiparametric method by minimizing risk measures, such as VaR and CVaR. The scaling effect is then optimized by maximizing the risk-sensitive value measures. "Empirical studies and data analysis" section presents the data set, intensive empirical results, and a comparison between the stressed portfolio and the traditional mean-variance method. We conclude the paper in "Conclusion" section.

Literature review
There are two major directions for tail risk estimation: modeling the return distribution and capturing the volatility process. For the former direction, various techniques are employed for modeling the entire return distribution or just the tail areas, including known parametric distribution, kernel density approximation, and extreme value theory (Tsay 2010). The latter direction mostly relies on discrete-time volatility models, such as the exponentially weighted moving average model (EWMA) and autoregressive general conditional heteroskedasticity (GARCH) model to capture the volatility process. See Jondeau et al. (2007) for further details.
Traditional modeling methods in financial management often rely on the Gaussian distribution by virtue of closed-form solutions for mean-variance analysis (Fu et al. 2021), the optimal risk measure, and so on. There are other risk measures such as the Entropic Value-at-Risk (Mills et al. 2017). However, some stylized facts of heavy tails and asymmetry among empirical distributions expose extra risk for the fraud of initial assumptions. In contrast, we relax the Gaussian assumption using a semiparametric method, which renders flexible distributions to describe more details and properties for unknown tail risks.
Distinct from previous studies on financial modeling, the aim of this study is to build up the joint distribution of portfolios in high dimensions without assumptions of each underlying asset distribution. This innovative construction of a joint distribution is based on nonparametric estimation (Robinson 1983) and the copula method (Cherubini et al. 2004(Cherubini et al. , 2011. Nonparametric estimation with a kernel function is adopted to estimate the probability density function of each underlying asset, and the parametric copula method is used to describe a joint distribution between the assets of the portfolio. (2.1) where K h (t) = 1 h K t h , h > 0. The positive number h is a smoothing parameter called the bandwidth of the kernel function.

Joint distribution: copula method
The copula method (Nelsen 1999, Cherubini et al. 2004) provides a useful tool for describing the dependence between variables. Two families of copula functions are often considered: elliptical and Archimedean copulas. Unlike the nonparametric kernel function, the copula method is parametric and contributes to the joint distribution of the portfolio from its multiple marginal distributions (Bouyé et al. 2000;Cambanis et al. 1981;Cherubini and Luciano 2001). where C is called a copula function.
The copula function C is a mapping of form C: [0, 1] m → [0, 1]. These are two major types of elliptical copula families: Gaussian and Student's t copulas. Both are associated with a class of elliptical distributions.

The multivariate dispersion copula
The m-dimensional normal or Gaussian copula is derived from the m-dimensional Gaussian distribution. The Gaussian copula is generated from a set of correlated normally distributed variates v 1 , v 2 …v m using Cholesky's decomposition, and then transforms these to uniform variables u 1 = �(v 1 ), u 2 = �(v 2 )…u m = �(v m ) , where is the cumulative standard normal; therefore, the pair (u 1 , u 2 . . . u m ) draws from the Gaussian copula.
The marginal distribution of each variable is standard normal, and the joint normal distribution can be defined as where R is the m-dimensional covariance matrix, and m is the cumulative multivariate normal distribution function in dimension m.
For the multivariate Gaussian copula (MGC), let R be a symmetric, positive define matrix with diag(R) = (1, 1 . . . 1) T , and the corresponding density function of (2.4) is, where R is the covariance matrix of vector X, and |R| is the determinant of . Let u j = x j ; therefore, x j = −1 u j . This copula density function can be rewritten as given below: Let µ = (µ 1 , µ 2 . . . µ m ) T be a positive parameter, σ = (σ 1 , σ 2 . . . σ m ) T be a dispersion parameter, and R be a correlation matrix. The multivariate dispersion copula (MDC) density is as given below:

The multivariate student's t copula
Similarly, the m-dimensional Student's t-copula is derived from the m-dimensional Student's t-distribution. Student's t copulas are models with a heavier tail than Gaussian copulas. We denote T m (ǫ 1 , . . . , ǫ m ; R, v ) be the joint Student's t distribution and T(x) be the univariate Student's t distributions. The Student's t copula is defined as, and its density function of the multivariate Student's t copula (MTC) is, where t −1 v is the inverse of the univariate cumulative distribution function of Student's t with v degrees of freedom. Using the standard representation, the copula density for multivariate Student's t copula (Cherubini et al. 2004) is:

The Archimedean copula
In contrast to the elliptical copula, it is easy to deduce parameterized multivariate distributions from the same class of marginal distributions. Given a function φ(x) as the generator of the Archimedean copula function, the formula of Archimedean copulas induces a copula by Three well-known Archimedean copulas are illustrated below with the following density functions (Table 1).
Although the Archimedean copula requires only one parameter in the estimation, the partial distribution function is not easy to calculate in high dimensions for the joint density function. Thus, we choose the MGC to build up the joint distribution in "Stressed portfolio optimization and its scaling effect" section for ease of computation.

The semiparametric method
The semiparametric method combines the nonparametric kernel and the parametric copula methods to describe the marginal distribution of each underlying asset and the joint distribution of the portfolio, respectively. Details about the formulation of each nonparametric and parametric method are discussed in the last section. We focus on the estimation procedures described below, including a bias estimation for the optimal bandwidth.

Optimal bandwidth choice
As mentioned in "Kernel function" section, the choice of bandwidth is not only pivotal as it determines the smoothness of the estimation but also plays a significant role in the weight function on a kernel. In addition, bandwidth choice is a crucial problem in kernel smoothing because no universally accepted approach exists to this problem yet.
One approach of cross-validation theory aims to minimize the mean square error (MSE) between the estimated and true densities. Thus, an appropriate h should determine the degree of smoothness and influence on the MSE between the kernel estimated density f p (x) and its true density f p (x).
Definition 3.1 The variance, bias, and MSE of the estimator are defined as Table 1 The Archimedean copulas

Types Copula function Copula multivariate function
Clayton Similarly, we could get results Let the density function f p (X) bound second derivative f ′′ p (X) , leading to Taylor expansion, Horová et al. (2012) for details.
From (3.2) and (3.3), we derive the MSE of the kernel density estimators as The optimal bandwidth is defined from the truncated MSE p (x) taking only the first leading order term as, In this approach, the optimal bandwidth can be obtained by some straightforward calculations: where k 1 = K 2 (u)du, k 2 = u 2 K (u)du. For Gaussian kernel k 1 = 1 4π , k 2 = 1 ; therefore,

Bias estimation for the perturbed optimal bandwidth
Here, we provide a perturbation analysis and show that the error of the Gaussian kernel function deviating from the optimal bandwidth is uniformly bounded. (3.1) Lemma 3.2 Given the Gaussian kernel function K h opt (t) = 1 h opt K t h opt with the optimal bandwidth choice h opt > 0, for any estimation error ε > 0 , there exists an independent constant M, such that K h opt (t) − K h opt +ε (t) < Mε , for t ∈ R.
This means that the bias between the optimal kernel and its perturbed density is uniformly bounded.
Proof Use Taylor expansion and the uniformly bounded property for the normal density.
Introducing the telescope expression, we obtain The first term on the left is bounded by M 1 ε regardless of the variable t for some independent constant M 1 . Because the Gaussian kernel function is a normal density function, by the mean-value theorem, the second term on the right is bounded above by M 2 ε for some independent constant M 2 . Therefore, K h opt (t) − K h opt +ε (t) ≤ (M 1 + M 2 )ε for an arbitrary t is obtained. □

The joint distribution of portfolio
As a semiparametric estimation, it has nonparametric and parametric components. The kernel method offers the marginal distribution of each asset under nonparametric estimation, and the copula method is common in parametric estimation, which builds up the joint distribution between marginal distributions. After combining these two components, the joint distribution of the portfolio is obtained.

Definition 3.3
The joint distribution of assets in our portfolio is as given below: where c(x 1 , x 2 , . . . x n ) are copulas using parametric methods, and f i (x i ) is the marginal distribution using nonparametric methods.
Once the joint distribution for the multivariate (X 1 , . . . , X n ) is estimated, its portfolio P with different weights (w 1 , .., w n ) is defined by, (3.7) P X 1 , . . . , X n, w 1 , .., w n = n i=1 w i X i , where w i and X i, are the weight and value of ith asset, respectively. The total sum n i=1 w i = 1 . When a weight w i is nonnegative, it means that the corresponding asset is not allowed for short selling.

Parameter estimation
Maximum likelihood estimation (MLE) was employed to estimate model parameters. Based on the joint density function, is the density of the n dimensional copula C(x 1 , x 2 . . . x n ; θ) . The log-likelihood function is defined as follows: is the log-likelihood function from the independent term with the copula C function, the rest term L i = N j=1 logf j (x i (j) ), i = 1, 2 . . . n is the log-likelihood function from the dependent term, which is not necessary to estimate parameters using the nonparametric kernel method, where log denotes the natural logarithm. Thus, only parameters in L C need to be estimated. Let θ denote the parameter set of copula C . This can be estimated by the following full MLE:

Stressed portfolio optimization and its scaling effect
This section introduces the methodology for stressed portfolio optimization, which includes specific procedures for constructing an optimal portfolio under tail risk and its scaling effect. We extend the use of risk measures (Artzner et al. 1999) for portfolio optimization using the previously mentioned semiparametric method. The optimal scales of such stressed portfolios are studied by maximizing risk-sensitive value measures (Miyahara 2010).

Risk measure minimization for stressed portfolio
As a regulatory standard or internal control for financial institutions, risk measures provide extreme information about potential value losses. Owing to its simplicity and clarification in risk management, VaR is the most conventional measure to estimate the loss of asset value, given a certain confidence level; therefore, an adequate capital amount is gauged to prevent negative impacts.
Definition 4.1 V aR α is defined as a quantile in statistics: where α is the confidence level, and X denotes either the loss of asset value or its loss return.
Conditional value-at-risk (CVaR), also known as expected shortfall, is a stringent risk assessment used to estimate the average losses exceeding VaR.
Definition 4.2 CVaR α is defined as a conditional expectation: where α is the confidence level, the variable X represents the loss value or its return, and VaR α (X) is defined above.
Note that both values of VaR α and CVaR α are variable X dependent. This means that they are not constant, even though the value of α is given. When the variable X is a portfolio, such as P defined in Eq. (3.7), minimizing nonlinear risk measures such as VaR α and CVaR α over the feasible set of portfolio weights, possibly in high dimensions, must be solved numerically. Discussions on data analysis and computational schemes are presented in "Statistical estimation for semiparametric method" section.

Value measure maximization for the scaling effect
The evaluation of a risk-sensitive portfolio is essential for finance. This section aims to revisit the optimal scale using the risk-sensitive value measures proposed by Miyahara (2010) and discuss some computational issues given stressed portfolios.

Definition 4.3
Let X be a linear space of return of portfolio; the risk-sensitive value measure in X is then the following functional defined on X: where α is the risk aversion parameter and α ∈ [0, 1].
For a Gaussian multivariate X , from its moment generating function the utility function (4.2) is explicitly obtained The mean-variance (MV) value measure is defined above, and the optimal scale for this MV value measure is obtained.
from solving a quadratic minimization over the scale of portfolio However, when the distribution of X is non-Gaussian, the mean-variance model is the first two leading terms of the risk-sensitive value measure. This can be easily deduced by substituting the Taylor expansion into (4.2) and obtain If X is centered at 0, i.e., E(X) = 0, As U (α) ( X) is a concave function of (Miyahara 2010), the optimal scale of the portfolio can be obtained by maximizing this scaled value measure: Because our portfolio variable X has a complex structure from the proposed semiparametric method, we adopt the following Monte Carlo estimator to solve the optimal scale as an approximation: where is the scale of the portfolio, α is the risk aversion, n is the sample size, and X (i)′ s are random samples from historical simulations.
We comment on the strict concavity of the approximate estimator in (4.4). This can be inherently derived from the concavity of the utility function defined in Eq. (4.2) by taking the random variable X as discrete and uniformly distributed on the set of fixed , outcomes X (1) , X (2) , . . . , X (n) . Since the graph of the risk-sensitive value measure over the scale is concave, the peak of this graph is identified as the optimal scale for its associated portfolio.

Empirical studies and data analysis
According to the framework depicted in Sects. Literature review and Stressed portfolio optimization and its scaling effect" sections, we designed the following experiments for stressed portfolio optimization using the semiparametric method. First, we build the marginal distribution for each constituent of the portfolio, given daily data from 2016 to 2020. We then describe the joint distribution of a portfolio with a Gaussian copula, which explains the dependence between these constituents. Second, we solve for the optimal weights from risk measure minimization using the genetic algorithm (GA) within MATLAB's package. Finally, the optimal scale based on the stressed VaR portfolio is solved numerically using an approximated Monte Carlo estimator. Intensive and heavy computation, which includes modeling by semiparametric estimation and portfolio optimization under tail risk, is executed on a server cluster equipped with four Intel Xeon 5220R CPUs. Each CPU is 2.2 GHz with 24 cores.

Statistical estimation for semiparametric method
To implement our methodology on real data, we construct a diversified portfolio with five ETFs: Vanguard S&P 500 ETF (VOO), iShares 20 + Year Treasury Bond ETF (TLT), iShares iBoxx investment grade corporate bond ETF (LQD), iShares Gold Trust ETF (IAU), and Vanguard Real Estate Index Fund ETF Shares (VNQ). Daily price data spanning from 2016 to 2020 were retrieved from the Bloomberg database. Daily returns were calculated from the difference between two consecutive log prices. Our implementation of the optimization models relies on a rolling-window approach. Specifically, at the beginning of each month, we use the return data of the previous three months to calculate the input parameters needed to determine the portfolio weights. Using these weights, we calculate portfolio returns over the next month. The following month, new portfolio weights are determined using updates of the parameter estimates.
The model parameters of the optimal bandwidth for the kernel function and the correlation matrix required in "Literature review" section for our portfolio are time-invariant in each estimate window (three months). The relevant parameters and estimation results are available upon request.

Optimal weights for risk measure: stressed portfolio optimization
Following the semiparametric model, applications for portfolio optimization under tail risk are presented. Tables 2 and 3 record the empirical results of in-sample fit for a      quarterly time span (three months), which is useful for training models. Tables 4 and 5 record the empirical results of out-of-sample fit for a monthly time span, which is useful for testing models. According to Eq. (4.1), portfolio VaR is a function of the weight vector w defined by where g denotes the function of the weight vector w , and the optimal weight w attains the minimum value of g(w). Table 2 records the in-sample fit for the optimal weight vector w , the performance of each stressed portfolio, and its VaR value for five consecutive years from 2016 to 2020. These performance results, including volatility, return, Sharpe ratio, and VaR, are calculated quarterly. According to Table 2, although Markowitz's model and semiparametric method have different objective functions for weight estimation, the two methods have comparable results for the Sharpe ratio. The in-sample results show that the semiparametric method always has a lower VaR than Markowitz's model.
Similarly, the portfolio CVaR is a function of weight vector w defined by the following equation: where k is a function of the weight vector w , and the optimal weight w is the minimum value of k(w) . The optimal weight, the performance of each stressed portfolio, and its CVaR value are listed in Table 3.
Tables 2 and 3 demonstrate the in-sample tests of the dataset and the performance measure of the optimal stressed portfolio on a long-term quarterly basis. According to VaR α (P(w)) = g(w), CVaR α (P(w)) = k(w),      Tables 2 and 3, although Markowitz's model and semiparametric method have different objective functions for weight estimation, the two methods have comparable results in terms of the Sharpe ratio. The empirical results of the in-sample show that the semiparametric method always has lower VaR and CVaR than Markowitz's model. We conduct out-of-sample tests on a short-term monthly basis by using the same set of five ETFs (VOO-equity, TLT-government bond, LQD-corporate bond, IAU-gold, and VNQ-real estate) and compare the performance of portfolios generated from the semiparametric method and Markowitz method from 2016 to 2020, as demonstrated in Table 4.
The results of return, volatility, Sharpe ratio, and risk measures were calculated monthly. As can be seen from Fig. 1, compared to S&P 500, our semiparametric method provides better results in terms of portfolio returns during those five years.
Note that Markowitz's mean-variance model is profit-oriented. It selects the portfolio with the highest Sharpe ratio from the efficient frontier of the five ETF assets. Nevertheless, the semiparametric method is risk-oriented. Its objective function aims to minimize the VaR/CVaR function. Compared with Markowitz's mean-variance method, Table 4 is summarized in Table 5. Our semiparametric method reduces the average volatility of the portfolio in those five years and decreases the average return in the same period, simultaneously, but increases the average Sharpe ratio of the portfolio. Our proposed method mitigates not only the whole risk but also the tail risk because our method has a lower portfolio VaR in those five years.
Similarly, the coherent risk measure CVaR is used to compare the results of the semiparametric method and Markowitz's method within the same test period from 2016 to   2020. Figure 2 depicts the portfolio value of the semiparametric model with CVaR and S&P 500. As shown in Table 7, which is a summary of Table 6, our semiparametric method reduces average volatility of portfolio in five years, whereas our method decreases average return in the same period. However, the semiparametric method increases the average Sharpe ratio of the portfolio. Our semiparametric method consistently offers better  risk management than the Markowitz model in comprehensive risk and tail risk because our method has a lower portfolio CVaR. In addition, we verify the robustness of the semiparametric method in several sensitivity checks. First, we extensively vary the dataset to examine whether our findings are robust with respect to the indices used to represent the asset classes. For example, we   add other ETFs or use alternative indices to our portfolio. This procedure often leads to changes in sample size. However, we find that the variation in the dataset does not alter any of our conclusions. Second, we examine whether the performance of our method improves when shorter and longer time series of historical returns are used for parametrization, and we base the estimation method on a rolling-window approach with 2 months and 4 months of historical data available in estimation. We do not observe a consistent improvement in additional tests. Third, we repeat our analysis by utilizing   other performance measures. Specifically, we employ the Sortino ratio, which does not change the qualitative nature of our results.
Optimal scale for value measure: scaling effect As mentioned above, we can obtain a stressed portfolio using the semiparametric method with the optimal weights by minimizing the VaR of the portfolio. To further understand the scaling effect of the portfolio, we compare the mean-variance model and risk-sensitive value measure with different risk aversion, denoted by α from zero to one. We assume that there are three types of investors: risk-averter ( 0.5 < α < 1 ), risk-seeker ( 0 < α < 0.5 ), and risk-neutral ( α = 0.5 ). We discuss the optimal scale of the portfolio during the five years with three types of investors, and the results are shown in Fig. 3, 4, 5, 6 and 7. Although the curve of mean-variance (MV) and risk-sensitive value measure (RSVM) are similar in shape to a downward parabola, the curve of MV has a particularly strong concavity. In theory, the MV is a special case of an RSVM. MV has a close-form optimal portfolio scale shown in Eq. (4.3), while the optimal scale of the   Tables 8 and 9.
The empirical results show a negative correlation between the degree of risk aversion and the optimal scale in the value measure. Risk-seeking investors correspond to larger scales, while risk-averters correspond to smaller scales. In addition, there is no difference in the mean-variance model and risk-sensitive value measure only for portfolios with a Gaussian distribution, but most portfolios are non-Gaussian in practice. If investors use a mean-variance model to determine the optimal scale, which may not be a real optimal scale, because the mean-variance model is not fit in a non-Gaussian distribution. Thus, the risk-sensitive value measure is pivotal in the stressed portfolio optimization.

Conclusion
We propose an innovative semiparametric method for financial modeling and discuss the applications of portfolio optimization under tail risk with the scaling effect. This semiparametric method is composed of a nonparametric method and a copula method by estimating marginal distributions and the dependence of assets in a portfolio, respectively. Stressed portfolios and their optimal scaling effects are designed to be obtained by minimizing risk measures and maximizing risk-sensitive value measures, respectively. Through intensive empirical data analysis, we observe that the mean-variance type Markowitz method may cause bias selection, compared to the semiparametric method, which improves the efficiency of risk management with less risk exposure.