 Methodology
 Open Access
 Published:
Discovering optimal weights in weightedscoring stockpicking models: a mixture design approach
Financial Innovation volume 6, Article number: 41 (2020)
Abstract
Certain literature that constructs a multifactor stock selection model adopted a weightedscoring approach despite its three shortcomings. First, it cannot effectively identify the connection between the weights of stockpicking concepts and portfolio performances. Second, it cannot provide stockpicking concepts’ optimal combination of weights. Third, it cannot meet various investor preferences. Thus, this study employs a mixture experimental design to determine the weights of stockpicking concepts, collect portfolio performance data, and construct performance prediction models based on the weights of stockpicking concepts. Furthermore, these performance prediction models and optimization techniques are employed to discover stockpicking concepts’ optimal combination of weights that meet investor preferences. The samples consist of stocks listed on the Taiwan stock market. The modeling and testing periods were 1997–2008 and 2009–2015, respectively. Empirical evidence showed (1) that our methodology is robust in predicting performance accurately, (2) that it can identify significant interactions between stockpicking concepts’ weights, and (3) that which their optimal combination should be. This combination of weights can form stock portfolios with the best performances that can meet investor preferences. Thus, our methodology can fill the three drawbacks of the classical weightedscoring approach.
Highlights

Finding the connection between weights of stockpicking concepts and performances.

Discovering the optimal combination of weights of stockpicking concepts.

Meeting various investors’ preferences.
Introduction
The efficientmarket hypothesis asserts that financial markets reflect all relevant information. Consequently, stocks always trade at their fair value on stock exchanges, making it impossible for investors to either purchase undervalued stocks or sell overvalued stocks. Certain empirical studies show that stock markets do not reach semistrong market efficiency (Hong and Stein 1999; Hong et al. 2000; Piostroski 2000; Richardson et al. 2010; Fama and French 2012; Asness et al. 2013; Yeh and Hsu 2014; Kong et al. 2019; Daniel et al. 2020; Wen et al. 2019). Banz’s (1981) size effect suggests that the return on investment (ROI) in stocks of small corporations is higher than that in stocks of large corporations. Rosenberg et al.’s (1985) value effect indicates that value stocks have higher ROI than growth stocks. Bondt and Thaler’s (1985) overreaction and reversal effects illustrate that winner stocks have a lower ROI than loser stocks do in the long term. The results of the momentum effect observed by Jegadeesh and Titman (1993) show that rising asset prices increase further, whereas falling prices keep decreasing. Thus, stocks with strong past performance will continue outperforming stocks with poor past performance during the next period.
Several recent studies have shown that a combination of effects could be used to construct a stock selection model with a high rate of return (RoR) (Piotroski 2000; Hart et al. 2003, 2005; Mohanram 2005; Qian et al. 2007; Roko and Gilli 2008; Yeh and Hsu 2011; Shen et al. 2015; Yu et al. 2014; Yeh et al. 2015; Rasekhschaffe and Jones 2019; Dai and Zhou 2019; Wu et al. 2020; Gu et al. 2020). Many studies also adopted a weightedscoring approach to construct a multifactor stock selection model (Piotroski 2000; Kang and Ding 2006; DuranVazquez, et al. 2014; Kim and Lee 2014; Tikkanen and Äijö 2018; Jeong and Kim 2019; Mehta, et al. 2019). For example, Hart et al. (2003) studied the profitability of various stock selection strategies in 32 emerging markets from 1985 to 1999. Value, momentum, and earnings revision strategies were the most successful as they generated significant excess returns when compared with size, liquidity, and mean reversion strategies. A strategy can be improved efficiently by combining various stockpicking factors. Finally, large institutional investors can implement these strategies successfully regardless of liquidity constraints and significant transaction costs.
Mohanram (2005) proposed combining traditional fundamentals, such as earnings and cash flow, with company growth indicators, such as earnings stability, growth stability, R&D intensity, capital expenditures, and advertising, to establish an index, that is, a Gscore. A long/short equity based on Gscore generated significant excess returns although most of the returns were generated through shorting. The results were robust to in size, analyst coverage, and liquidity issues and persisted after controlling for momentum, pricetobook value ratios (P/B ratio), and accruals. Firms with a high Gscore demonstrated strong market reaction to future earnings announcements and unannounced analyst forecast. In addition, a riskbased approach cannot explain the results because returns were positive in most years and lowrisk companies earned high returns. Finally, fundamental analysis worked best when traditional and growthbased analyses were paired with stocks with high and low P/B ratio, respectively.
Noma (2010) combined traditional fundamentals, such as return on assets, operating cash flow, and operating margins, as an Fscore index. Fscore was applied, and it demonstrated that the mean return can increase by 7.8% through a hedging strategy that buys highFscore firms and that shorts firms with a low Fscore. Additionally, an investment strategy that buys firms with a high P/B ratio and Fscore and shorts those with a low P/B ratio and Fscore earns a 17.6% annual return. The empirical result also reveals that the Fscore can predict future earnings.
However, this method either sets up weights subjectively or uses a simple average, leading to three drawbacks. First, it cannot effectively identify the connection between the weights of stockpicking concepts and portfolio performances. Second, it cannot provide an optimal combination of stockpicking concepts’ weights. Third, the method cannot meet various investor preferences. For example, a conservative investor may only withstand low risks; hence, returns would not be the first priority. Therefore, stock selection factor weights should differ from those of an aggressive investor who considers returns the top priority.
We address these shortcomings by adopting the following methodology (Fig. 1):

(1)
We design stockpicking concepts’ weight combinations with a mixture design (Myers and Montgomery 2008; Montgomery 2012). Accordingly, we generate a set of weighted combinations (x) of stockpicking concepts to collect information on performances with different weight combinations of stockpicking concepts.

(2)
Based on the mixture design, we simulate stockpicking concepts’ weight combinations to obtain investment performances (y) through backtesting using stock market trading historical database. These results can be collected and matched as (x, y) to construct a data set.

(3)
Based on the data set, we construct a performance prediction model, y = f(x), by employing a multivariable polynomial regression analysis. The prediction model can examine the relationship between the performances and weights of stockpicking concepts and identify the interactions between concepts.

(4)
Based on the prediction model, we discover stockpicking concepts’ optimal combination of weights that can form a stock portfolio with the best performance to meet investor preferences by way of optimization techniques.

(5)
We verify stockpicking concepts’ optimal combination of weights through backtesting using stock market trading historical database to determine whether they can meet investor preferences.
Therefore, this methodology can resolve the three drawbacks of the aforementioned extant literature. The remainder of this paper is organized as follows. “Mixture experimental design” section explains how we develop a mixture design. “Experimental design and implementation” section describes how we generate stockpicking concepts’ weight combinations through this mixture experimental design and simulate them by way of backtesting. “Model building and verification” section constructs and analyzes the performance prediction model through a multivariable polynomial regression analysis. “Weight optimization and validation” section presents the determination of stockpicking concepts’ optimal weight combinations through optimization and the validation of them through backtesting. Last, “Conclusion” section concludes the paper.
Mixture experimental design
We systematically explore the relationship between different weight combinations of various factors and portfolio performances through mixture design, that is, a type of experimental designs, given that the sum of the weight combinations equals 1. The components of mixture experiments are their factors, the levels of which are dependent. Thus, each factor used in stock selection is assigned a weight, which is a level. We use the simplexcentroid design to conduct the mixture experimental design and thus produce various weight combinations (Montgomery 2012). The simplexcentroid design’s qtype composition is expected to have \(2^{q}  1\) experimental mixes. For example, Figs. 2 and 3 show the simplexcentroid design for a mixture with three and four components, respectively.
Then, we construct the model by employing a regression analysis using the experimental data obtained through backtesting. The polynomial functions of the simplexcentroid design are expressed as follows (Montgomery 2012):
where y, x_{i}, and β represent the response variable of the mixture, the proportion of the ith component of the mixture, and the regression coefficient of the regression model, respectively.
The effects of the higherorder terms can be ignored because they are usually small. In most real applications, only the first, second, and third terms may be significant. For instance, if q = 3 and threeorder terms are included, then we have the following:
According to Eq. (2) above, if the threecomponent mix is at the onecomponent mix (1, 0, 0), mix (0, 1, 0), or mix (0, 0, 1), then their expected responses are \(\beta_{1}\),\(\beta_{2}\), and \(\beta_{3}\), respectively.
Figure 4 shows that the coefficient of the linear term is the regression estimates of the three apexes, and the average value of the coefficients of the three linear terms is the regression estimates of the central point. This finding indicates that the slope between the central point and apexes is positive if the coefficient of the linear term is larger than the average value of the coefficients of the three linear terms. Therefore, the correspondent regression estimate increases if the component is large. Conversely, the slope is negative if the coefficient is smaller than the average, the regression estimate decreases if the component is large.
If the threecomponent mix is at the twocomponent mix (1/2, 1/2, 0), (1/2, 0, 1/2), or (0, 1/2, 1/2), then according to Eq. (2), their respective expected responses are as follows:
Thus, the coefficient of quadratic term \(\beta_{ij}\) is four times the difference between the centralpoint regression estimate of side E(y) and average regression estimate values from the two apexes of side \({(}\beta_{i} + \beta_{j} {)}/2\). Therefore, the regression estimate of this side is a convex function if the coefficient of the quadratic term is greater than zero. Otherwise, the regression estimate of this side is a concave function if the coefficient of the quadratic term is less than zero (Fig. 5).
Experimental design and implementation
Factor Screening and stockpicking concepts
Stock selection factors can be divided into five categories as follows:
Value factors
Returns from cheap stocks are higher than those of expensive stocks. Commonly used ratios for these factors include the pricetoearnings (P/E) and P/B ratios.
Growth factors
Stocks from profitable companies have higher returns than those from unprofitable companies do. Commonly used ratios include return on equity (ROE).
Momentum factors
Stocks with high recent returns have higher returns than those with low recent returns do. Quarterly and monthly RoRs are used to measure the momentum effect.
Size factors
Stocks from small firms have higher returns than those from large firms do. The total market capitalization (or market value) is commonly used to measure company size.
Liquidity factors
Stocks with low liquidity have higher returns than those with high liquidity do. Quarterly trading volume is commonly used to measure stock liquidity.
The performance indicators of portfolio investment can be divided into three categories, namely, returns, risks, and liquidity. Appropriate stock selection factors should be chosen to build a decisionmaking model that optimizes and satisfies these performance indicators as discussed below:
Returns
The P/B ratio and ROE are the most representative indicators of value and growth stocks. In addition, the last quarterly and monthly RoRs may affect the RoR because of reversal or momentum effects in stock markets. This study uses the P/B ratio, ROE, and monthly RoR as stock selection factors.
Although the forward P/E ratio may depict the highest RoR for a portfolio, it is based on analysts’ earnings forecasts. In fact, using appropriate combinations of weights, combining the P/B ration and ROE (based on historical earnings), can achieve a RoR comparable with that achieved by the forward P/E ratio. Therefore, the forward P/E ratio was not used in this study because of the lack of evident advantages. Moreover, the P/B ratio and ROE represent stock value and growth, respectively, while the P/E ratio mixes value and growth. In terms of regulating the portfolio’s various performance aspects, two independent factors are better than a single, mixed factor, which is another reason for not using the P/E ratio in this study.
Risks
The β value of a stock is often continuous in nature; that is to say, stocks with large current β will usually have large future β, and vice versa. Therefore, previous β is chosen as a stock selection factor to control the systematic risks of the selected stocks.
Liquidity
Market capitalization needs considerable time to grow or decline. Thus, stocks with large (or small) market value usually remain unchanged in the future. Although stocks with a small market value may generate a high return, certain investors may prefer investment targets that demonstrate significant liquidity and investment availability; hence, market value should not be extremely low. Therefore, stocks were ranked according to total market capitalization from large to small in order to ensure high market capitalization for selected stocks, whereby stocks with a large market value would have a corresponding high score.
This study employed the multifactor weighted method to select stocks that were sorted by default order according to stock selection factors. The scores for top and bottomend stocks were 100 and 0 points, respectively, whereas those for the remaining stocks were obtained through interpolation. Each factor’s score was weighted to obtain the total weighted score. Furthermore, the stock with the highest total weighted score was considered the most profitable. Different weights form various stockpicking strategies and have different performances. Hence, performance is a function of weights. Therefore, weights should be employed as design variables of the optimization model. Each factor also needs a defaultsorting direction for the stocks as described below:
Small P/B ratio concept
The smaller the P/B ratio the higher the future returns. Therefore, stocks were sorted in ascending order according to the P/B ratio, with a smaller P/B ratio receiving a higher score.
Large ROE concept
The higher the ROE, the higher the future returns. Therefore, stocks were sorted in descending order according to ROE, with a higher ROE receiving a higher score.
Large monthly return concept
Stock market returns are usually characterized by short and longterm reversals and middleterm momentum. Momentum effects typically occur during one or several months. Therefore, future stock returns may be high or low depending on the domination of either reversal or momentum effect, given a most recent high quarterly or monthly return. However, most investors psychologically prefer to buy stocks with high recent return. Therefore, stocks were sorted in descending order according to monthly returns, with a high monthly return receiving a corresponding high score.
Large total market capitalization concept
Total market value indicates company size. A company may be at the growing stage when it has low total market value. By contrast, a large total market capitalization implies that the company has established a leadership position in its industry. Although stocks of small firms may generate a high return, many investors may prefer investment targets that demonstrate significant liquidity and investment availability; hence, market value should not be extremely low. Therefore, stocks were sorted in descending order according to the total market capitalization to ensure that the selected stock had an appropriately high market capitalization, with a large total market capitalization receiving a corresponding high score.
Small beta concept
Beta (β) measures stock return fluctuation relative to a benchmark (market), that is, systematic risks. A higher β implies that the stock return fluctuation is higher than that of the benchmark. If a stock’s β is greater than 1, its return fluctuation is greater than the benchmark, and vice versa. A stock’s beta value is often persistent; that is to say, stocks with large (small) current β would typically have large (small) future β in the near future. Although a large β may imply higher returns according to classic theory, many investors may prefer investment targets demonstrating significant low systematic risk; therefore, β cannot be too large. Hence, stocks were sorted in ascending order according to their β to reduce the selected stock’s systematic risk. Stocks with small β receive high scores.
The weightedscoring approach is employed to construct the weightedscoring multifactor stock selection model in two steps:

(1)
A singlefactor scoring method is used to sort stocks. The top stock is assigned a score of 100, whereas the bottom stock is assigned a score of 0. An interpolation method is applied to the rest of stocks.

(2)
The multifactor scoring method is employed. We obtain each stock’s overall score by assigning a certain weight to the score for each factor. Thus, the stock with the highest overall score is the best stock, and vice versa.
For example, previous literature results show that rate of return is high if the ROE is large and P/B ratio is small and vice versa. Therefore, the stock with the highest ROE or lowest P/B ratio is assigned a score of 100, whereas the stock with the lowest ROE or highest P/B ratio is assigned a score of 0. An interpolation method is applied to the rest of stocks. For example, a stock is assigned a score of 80 if its ROE is larger than those of 80% of all sample stocks. Similarly, a stock is assigned a score of 40 if its P/B ratio is lower than that of 40% of all sample stocks. Furthermore, we assume that the weights of the two stocks are 1/2 and 1/2. Then, the weighted scores are (1/2)*80 + (1/2)*40 = 60.
Definitions of performance indicators
Portfolio performance is evaluated through three categories: returns, risks, and liquidity. This paper adopts these three performance indicators as shown below:
Excess rate of return α
Excess rate of return is estimated by the following regression equation:
where R_{f} = riskfree rate of return, R_{m} = market return, R = return of the portfolio. The equation denotes a positive excess return from the portfolio if α > 0.
Systematic risk β
Systematic risk β can be estimated using Eq. (6). The higher the coefficient β, the higher the systematic risk. Portfolio volatility is higher than overall market if β > 1.
Stock market value in the portfolio
The larger the total market value of corporate stocks, the larger the trading volume of the stocks. The median of total market values of the portfolio’s corporate stocks is chosen as a proxy variable in assessing the portfolio’s liquidity.
Data set partition
A relationship exists between the weights of stockpicking concepts and each investment performance indicator. For instance, the qualitative relationship of stocks indicates that the returns of lowpriced stocks with a small P/B ratio are usually higher than those of highpriced stocks with a large P/B ratio are. However, stocks’ quantitative relationship can vary and change over time and thus can be highly volatile. Previous data cannot be used to construct models for selecting stocks for future investment when the quantitative relationship between the weights of stockpicking concepts and each investment performance indicator indicates high volatility. Therefore, it is necessary to consider the time factor in order to explore whether the quantitative relationship is stable when we separate the data into two types, namely, insample and outofsample.
This study covers 19 years divided into two periods: the modeling and testing periods. Investment performances obtained through backtesting the weights of stockpicking concepts during the modeling period 1997–2008 and testing period 2009–2015 comprise the insample and outofsample data, respectively.
Experimental design
The simplexcentroid design’s qtype composition is expected to have \(2^{q}  1\) experimental mixes. This study has five components, that is, \(2^{5}  1\) = 31 experimental mixes. Thus, 31 combinations of stockpicking concept weights are obtained. Each factor can be set up with six levels of weighting percentages, including 1, 1/2, 1/3, 1/4, 1/5, and 0 as exhibited on the lefthand side of Table 1.
Experimental implementation
The following steps are used to obtain performance indicators through the 31 combinations of stockpicking concept weights proposed in this study.
Establishing a monthly database of corporate stocks
We collect information on the P/B ratio, ROE, monthly rate of return, market values, and 250day β of stocks between 1997 and 2015 from the Taiwan Stock Exchange and overthecounter (OTC) markets.
Establishing monthly investment portfolios and calculating their performances
The holding duration of investment portfolios is 1 month. We establish investment portfolios with the top 10% weightedscoring stocks at the end of each month according to the 31 combinations of stockpicking concept weights in Table 1. We also calculate the median market value of the stocks in the portfolios and their following monthly rate of return of the portfolios.
Measuring the overall performances of each mixture design
We calculate the monthly excess rate of return α and systematic risk β using Eq. (6) with the portfolios’ and market monthly rates of return. We also calculate the mean using the median market value of corporate stocks of each monthly portfolio.
The righthand side of Table 1 presents the results of experimental implementation.
Model building and verification
Constructing the performance prediction model
The dependent variables of the regression model are the three performance indicators, namely, excess rate of return α, systematic risk β, and market value of the stocks in the portfolio. Hence, there are three regression models. Furthermore, the experimental results in Table 1 show that the distribution of medians of the market values deviates from the normal distribution. Therefore, naturallogs of market values were used to address the issue.
The independent variables of the regression model are the weights of the five stockpicking concepts. Regression analysis was conducted by way of polynomial regression in Eq. (1). Effects of the higherorder terms can be ignored because they are extremely small. This study selects the first, second, and third terms only. Given that the stockpicking concepts have five weights, each regression equation has five linear terms, 10 twofactor interaction terms, and 10 threefactor interaction terms, totaling 25 regression coefficients. A stepwise regression was adopted to eliminate certain insignificant terms.
Table 2 summarizes the regression coefficients and their tstatistics and significance from the models based on the threeorder multivariable polynomial stepwise regression analysis. We use the regression coefficients to identify the impacts of the independent variables and their relationship with the dependent variables based on the following rules (Myers and Montgomery 2008; Montgomery 2012):

1
Coefficients of the linear terms
Linear terms have positive effects if their coefficient is larger than the average coefficient of linear terms, and vice versa.

2
Coefficients of the quadratic term
The regression estimate between two independent variables is a convex function if the coefficient of the quadratic term is larger than zero. Otherwise, it is a concave function if the coefficient of the quadratic term is less than zero.
Monthly excess rate of return
Table 2 shows that the two stockpicking concepts, namely, small P/B ratio and large ROE, positively impact monthly excess rate of return. In contrast, the other three stockpicking concepts, particularly large market capitalization concept, have negative impacts. These findings indicate that lowpriced stocks tend to have higher return than highpriced stocks do. Similarly, profitable corporate stocks have higher return than less profitable ones do. These two longterm stockpicking concepts remain solid.
The concepts of large monthly return and small β negatively impact monthly excess rate of return. However, distinctively positive relationships exist among the four stockpicking concepts, namely, small P/B ratio, large ROE, large monthly return (R), and small β (beta) at the quadratic interactions terms, including P/B*ROE, P/B*R, P/B*beta, ROE*R, ROE*beta, R*beta, and beta*MV. They are all significant at the level of 0.001. These results indicate the stockpicking concepts’ synergy effects and ability to enhance returns. The effect is particularly outstanding in the case of ROE*beta and signifies that the return of corporate stocks with a large ROE and small β is relatively stable relative to the return of corporate stocks only with a large ROE because of the attribution of corporate stock’s small β. This finding also implies that low volatility can help to sustain profitability, then to obtain high returns.
Additionally, the monthly excess rate of return model has five significant coefficients of cubic terms. The first two, PBR*ROE*R and PBR*ROE*MV, are positive and share the characteristics of stockpicking concepts, namely, small P/B ratio and large ROE. The last three, PBR*R*MV, ROE*R*MV, and R*beta*MV, are negative and share the common characteristics of a stockpicking concept, that is, large total market value.
Monthly systematic risk β
Two stockpicking concepts, namely, small β and large total market value, negatively impact monthly systematic risk β. The other three stockpicking concepts, namely, small P/B ratio, large ROE, and large monthly rate of return positively impact monthly systematic risk β. This finding indicates that the systematic risk of a portfolio, including corporate stocks with a small β and large total market value in the past, is relatively small. In contrast, the coefficients of the three cubic terms, P/B*R*beta, ROE*R*beta, and ROE*beta*MV, are significantly less than zero, whereas none of the coefficients of the quadratic terms are significant. The above three cubic terms share the common characteristics of a stockpicking concept, that is, small β. These results imply that the small β is the most important concept to lower portfolios’ systematic risk, and the systematic risk can be reduced by considering additional stockpicking concepts.
Stock market value median of the portfolio
The concept of large total market value positively impacts the median market value of corporate stocks of the portfolio, whereas the other four stockpicking concepts have negative impacts. Although linear terms of these four concepts have negative impacts, several of these concepts still have positive interactions with the concept of the large total market value and therefore can increase the median of market values of stocks in the portfolio. The most distinguished cases are ROE*MV and R*MV. However, the interaction term PBR*beta negatively impacts the median market value of stocks in the portfolio.
Outofsample prediction power of the regression models
The scatter diagrams in Figs. 6, 7 and 8 are drawn from the predicted values and actual values of the testing period data (outofsample) for the three regression models of performance indicators. The above data are produced from the 31 combinations of weights of stockpicking concepts through a mixture experimental design. These data help verify whether the prediction model based on the modeling period (1997–2008) data can also be applied to predict the performances during the testing period (2009–2015). The prediction model’s outofsample prediction effects indicate that the mean of the median stock market value of the portfolio has the best accuracy, followed by monthly excess rate of return α, whereas monthly β has the worst performance.
A few actual values of monthly β of the testing period data (outofsample) deviate from the predicted values of the modeling period data (insample). Figure 7 illustrates the code of the mixture design of the data to investigate further as to which data show a large deviation. Table 3 also presents their combinations of weights and indicates that large data deviation results from employing only one or two stockpicking concepts. Three of the five data through single stockpicking concepts, that is, large monthly rate of return, small β, and large market value, have a large deviation. Two of the 10 mixtures through two stockpicking concepts, and one of the 10 mixtures through three stockpicking concepts have large deviations. None of the mixtures through four or more stockpicking concepts has a large deviation. Thus, we may conclude that employing several stockpicking concepts can stabilize the relationship between the weights of stockpicking concepts and monthly systematic risk β.
Table 4 exhibits a comparison of adjusted coefficients of determination of the prediction model in the modeling period (insample) and testing period (outofsample). We found that the explanatory powers of the prediction model in the outofsample are lower than those in the insample are. Although the prediction model’s explanatory power for monthly excess rate of return α in the outofsample (76.4%) is lower than that in the insample (98.1%), its coefficient of determination maintains a high level. The prediction model’s explanatory power for monthly systematic risk β in the outofsample (31.9%) is much lower than that in the insample (56.9%).
Visualization of the regression model
The mixcontour plots exhibit each prediction model to investigate the interactions of the weight of each stockpicking concept. The mixcontour plot in Fig. 9 is a regular triangle chart, whose apexes, sides, and interior are single, two, and threecomponent mixes, respectively. The midpoint of each side is a twocomponent mix (1/2, 1/2), whereas the centroid of the triangle is a threecomponent mix (1/3, 1/3, 1/3). The contour lines of the dependent variable (response) in the triangle are employed to visualize the impacts of each component on response.
Only three components can be shown in a triangle’s mixcontour plot; hence, we select three of the five components and assume the other two components as zero to construct a mixcontour plot. If we select three of the five components each time, 10 combinations are generated, resulting in 10 mixcontour plots. Therefore, we produced 10 mixcontour plots from the five stockpicking concepts (components) adopted in this paper. Then, we investigate the mixcontour plot of each performance indicator (response) below.
Excess rate of return
Responses from the midpoints of the sides of “small P/B–large ROE,” “small P/B–large momentum,” and “large ROE–large momentum,” are significantly higher than their two apexes according to the first mixcontour plot on the upperlefthand side of Fig. 9. Therefore, the three sets of twocomponent mix have significantly positive interactions. Their regression coefficients are all statistically significant (5% threshold value) as shown in Table 2. Thus, the implication of the first mixcontour plot in Fig. 9 confirms the results in Table 2.
The second mixcontour plot of “small P/B–large ROE–small beta” denotes that the three sets of twocomponent mix have significantly positive interactions. These results are consistent with those in Table 2.
The third mixcontour plot of “small P/B–large ROE–large market value” signifies that the two sides have monotonous responses. The response increases if the side moves from the apex of large market value closer to either of the two other apexes, namely, the small P/B ratio and large ROE. These findings indicate that the two sets of twocomponent mix, namely, “small P/B–large market value” and “large ROE–large market value,” have no interactions, which are consistent with the results in Table 2. The same explorations can also be applied to the other plots.
Monthly systematic risk β
The first three charts on the upper side of Fig. 10 are “small P/B–large ROE–large momentum,” “small P/B–large ROE–small beta,” and “small P/B–large ROE–large market value” mixcontour plots. Their twocomponent mixes do not interact because the responses of each side are all monotonous.
The fourth, seventh, and ninth charts are the mixcontour plots of “small P/B–large momentum–small beta,” “large ROE–large momentum–small beta,” and “large ROE–small beta–large market value,” respectively. They are commonly characterized by the smallest responses of the triangle’s midpoints. In other words, their threecomponent mixes have significantly negative interactions. Moreover, their regression coefficients of the cubic terms are all negative values as shown in Table 2. Thus, these mixcontour plots match the results in Table 2.
Total market capitalization
The second mixcontour plot on the upperlefthand side of Fig. 11 illustrates that the response from the midpoints of the sides of “small P/B–small beta” is significantly lower than that of the two apexes. Table 2 results show that this twocomponent mix has statistically significantly negative regression coefficient (5% threshold value), which is consistent with the mixcontour plot.
Weight optimization and validation
The most important benefit of a mixture experimental design is that it can provide an optimal composition of the mixture. Hence, this study can provide the optimal combination of weights of stockpicking concepts as follows.
To find optimal weights, \(W\), perform the following operations:
Subjected to
where α = \(f_{{\alpha }} {\text{(W)}}\) = monthly excess rate of return, β = \(f_{\beta } {\text{(W)}}\) = monthly systematic risk, and MV = \(f_{{{\text{MV}}}} {\text{(W)}}\) = market value of the portfolio.
The above optimization model is a simple classical nonlinear programming problem, which can be solved using classical nonlinear programming algorithms. We used the generalized reduced gradient (GRG) algorithm to solve the optimization models. The details of the algorithm can be found in the literature (Nocedal and Wright 1999).
We can use the above model to determine the optimal combination of weights of stockpicking concepts. By doing this, we can maximize excess rate of return and limit the portfolio’s systematic risk and market value to satisfy upper and lower bounds. Then, we apply the optimal combination of weights of stockpicking concepts to form a portfolio with the highestscoring decile stocks.
Rate of Return Maximization with Risk Limitation
Figure 12 exhibits the weights of stockpicking concepts for maximizing the monthly excess rate of return α by limiting the monthly systematic risk β to less than 1, 0.95, 0.9,…, 0.55. Implications of the results of Fig. 12 include the following:

(1)
When the limit of the systematic risk β is set at a loose level (β > 0.9), the weights of the small P/B ratio, large ROE, large momentum, and small beta are 38%, 35%, 23%, and 4%, respectively.

(2)
When the limit of systematic risk β is set at a relatively loose level (0.7 < β < 0.9), the weights of the small P/B ratio, large ROE, and large momentum decrease, whereas that of the small beta increases.

(3)
When the limit of the systematic risk β is set at the middle level (0.6 < β < 0.7), the weights of the small P/B ratio and large momentum drop sharply, whereas those of large ROE and small beta increase. Besides, the stockpicking concept of large market value becomes important.

(4)
When the limit of the systematic risk β is set at a strict level (β < 0.6), the weights of large ROE and small beta decrease, whereas those of small P/B ration and large momentum become zero. Then, the weight of large total market value becomes the most important. However, the optimal combination of weights of stockpicking concepts is not available if the limit of the systematic risk is set at a further stricter level lower than 0.4.
Validation in modeling period
Figure 13 illustrates the portfolio performances of the 31 combinations of weights of stockpicking concepts during the modeling period (1997–2008) in round, black spots. The upperlefthandside curve in Fig. 13 is the risk–return relationship curve comprising the predicted values of the prediction model of the optimal weights. This risk–return curve is drawn from the optimization model and close to the edge of the upperlefthandside area of the portfolio performances of the 31 combinations of weights of the mixture experimental design, forming a risk–return efficient frontier. This curve has 11 spots, which are estimated results when the limits of the monthly systematic risks β are set at the level of 0.9, 0.85, 0.8,…0.45 and 0.4. The estimated results are the same as the monthly systematic risk limit at the level between 0.9 and 1.0. Therefore, these performances generate overlapping spots, except for one of the portfolio performances of the 31 combinations of weights lying beyond the efficient frontier. This condition may be attributed to the prediction model’s inaccuracy.
Validation in testing period
The optimization model is associated with the prediction model based on the performances during the modeling period (1997–2008). We further conduct backtesting on the testing period (2009–2015) using the above optimal weights generated by the optimization model to verify whether the optimal weights can also be applied to stock markets during this period. Figure 14 shows the portfolio performances of the 31 combinations of weights of stockpicking concepts during the testing period in round, black spots. The upperlefthandside curve depicts a risk–return relationship and is drawn through actual backtesting values of the optimal weights during the testing period. The 11 spots along this curve are the backtesting results when the limits of the monthly systematic risk β are set at the levels of 0.9, 0.85. 0.8,…, 0.45, and 0.4. The risk–return curve is close to the edge of the upperlefthandside area of the portfolio performances of the 31 combinations of weights and forms a risk–return efficient frontier. Only two of the portfolio performances of the 31 combinations of weights lay beyond the efficient frontier. Therefore, we may conclude that the optimal weights have a good performance not only in the modeling period but also in the testing period.
In sum, Figs. 13 and 14 demonstrated that our approach could create a group of portfolios close to the risk–return efficient frontier not only during the modeling period (insample) but also during the testing period (outofsample).
Rate of return maximization with market value limitation
Figure 15 displays the weights of stockpicking concepts for monthly excess rate of return α maximization by limiting the market value to greater than 1, 2, 5, 10, 20, 50, and 100 billion NT dollars. The median market value of stocks in Taiwan Stock Exchange is approximately 3 billion NT dollars. The implications of the results in Fig. 15 include the following:

(1)
When the market value requirement is set at a lower level (< 2 billion NT dollars), the weight of small P/B ratio, large ROE, large momentum, and small beta are 38%, 35%, 23%, 4%, respectively.

(2)
When the market value requirement is set at a normal level (2–5 billion NT dollars), the weights of large ROE, small P/B ratio, and large momentum increase, decrease, and remain unchanged, respectively.

(3)
When the market value requirement is set at a high level (5–10 billion NT dollars), the weights of large ROE, small P/B ratio, and large momentum increase, decrease, and becomes zero, respectively. Then, the concepts of small beta and large market value become more important.

(4)
When the market value requirement is set at an extremely high level (10–50 billion NT dollars), the weights of small P/B ratio and small beta gradually decrease to zero. Moreover, the weight of large ROE slightly becomes lower, and large market value stockpicking concept becomes the most important. No weight combination is available from the optimization model when the size requirement is larger than 100 billion NT dollars. The optimal combination of weights with the greatest market value of 78 billion NT dollars is that of large market value at and large ROE concepts at 90% and 10%, respectively.
Validation in modeling period
The 31 round, black spots shown in Fig. 16 are the portfolio performances of the 31 combinations of weights of stockpicking concepts during the modeling period. The curve on the upperrighthand side comprises the prediction model’s predicted values of optimal weights drawn from the optimization model and depicts the relationship between market value and return. This curve is close to the edge of the upperrighthandside area of the portfolio performances of the 31 combinations of weights of the mixture experimental design. The curve forms a marketvalueandreturn efficient frontier. The six spots along the curve are the estimated results when market value size requirements are 2, 5, 10, 20, 50, and 100 billion NT dollars. The prediction model is significantly accurate because no spot lies beyond the efficient frontier.
Validation in testing period
We further conduct backtesting on the testing period with the above optimal weights to verify whether the optimal weights can also be applied to the stock markets of the testing period. The round, black spots in Fig. 17 depict the portfolio performances of the 31 combinations of weights of the mixture experimental design during the testing period. The upperrighthandside curve depicts the relationship between market value and return, which comprises the actual backtesting values. The six spots along this curve are the backtesting results when market value requirements are set at 2, 5, 10, 20, 50, and 100 billion NT dollars. This curve is close to the upperrighthandside edge of round, black spots, which depicts market value and return performances of the portfolio of the 31 combinations of weights. The curve forms a marketvalueandreturn efficient frontier because no spot lies beyond the curve. This condition signifies that the optimal weights have a good performance not only in the modeling period but also in the testing period.
Conclusion
Considerable literature has revealed that the more factors are included, the higher the rate of return would be. Several studies adopted a weightedscoring approach to construct a multifactor stock selection model. However, this method sets up weights subjectively or uses a simple average and thus cannot effectively identify the connection between the weights of stockpicking concepts and portfolio performances, provide optimal weights of stockpicking concepts, and meet various investor preferences.
This study addresses these drawbacks by employing mixture experimental designs to collect the weights of stockpicking concepts and portfolio performance data and to construct performance prediction models based on the weights of stockpicking concepts. Moreover, we employed these performance prediction models and optimization techniques to determine the optimal combination of weights of stockpicking concepts.
The samples consist of all stocks listed in the Taiwan Stock Exchange. Backtesting is conducted on the 19 years between 1997 and 2015. The 1997–2008 and 2009–2015 periods are employed as the modeling period (insample) and testing period (outofsample), respectively. The results provide important implications for stock investment.
First, mixture experimental designs and multivariable polynomial regression can construct performance prediction models based on the data set from the training period. These models are accurate not only during the training period but also during the testing period.
Second, the methodology can discover significant interactions between the weights of stockpicking concepts. The ROE and beta significantly positively impact the portfolios’ excess rate of return and hence can effectively increase portfolio’s return. P/B*R*beta, ROE*R*beta, and ROE*beta*MV significantly negatively impact portfolios’ systematic risk β. Thus, they can effectively reduce portfolio’s risk. Furthermore, ROE*MV and R*MV significantly positively impact portfolios’ market value. Therefore, they can effectively increase the portfolio’s liquidity.
Third, the optimization techniques can efficiently determine the optimal combination of weights of factors that can form stock portfolios with the best possible performance and can meet various investor preferences.
Thus, our methodology can resolve the three drawbacks of classical weightedscoring approach.
Availability of data and materials
The dataset on which the conclusions of the manuscript rely is a secondary data and it will be made available upon request.
Abbreviations
 Beta:

Systematic risk
 MV:

Market value
 PBR:

Pricetobook value ratio
 ROE:

Return on equity
 R:

Monthly return
References
Asness CS, Moskowitz TJ, Pedersen LH (2013) Value and momentum everywhere. J Finance 68(3):929–985
Banz RW (1981) The relationship between return and market value of common stocks. J Financ Econ 9(1):3–18
Bondt W, Thaler R (1985) Does the stock market overreact? J Finance 40(3):793–805
Dai J, Zhou J (2019) A novel q1antitative stock selection model based on support vector regression. In: 2019 international conference on economic management and model engineering (ICEMME), IEEE, pp 437–445
Daniel K, Mota L, Rottke S, Santos T (2020) The crosssection of risk and returns. Rev Financ Stud 33(5):1927–1979
DuranVazquez R, LorenzoValdes A, CastilloRamirez CE (2014) Effectiveness of corporate finance valuation methods: Piotroski score in an Ohlson model: the case of Mexico. J Econ Finance Admin Sci 19(37):104–107
Fama EF, French KR (2012) Size, value, and momentum in international stock returns. J Financ Econ 105(3):457–472
Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev Financ Stud 33(5):2223–2273
Hart JV, Slagter E, Dijk DV (2003) Stock selection strategies in emerging markets. J Empir Finance 10(1–2):105–132
Hart JV, Zwart G, Dijk DV (2005) The success of stock selection strategies in emerging markets: Is it risk or behavioral bias? Emerg Markets Rev 6(3):238–262
Hong H, Stein JC (1999) A unified theory of underreaction, momentum trading and overreaction in asset markets. J Finance 54(6):2143–2184
Hong H, Lim T, Stein JC (2000) Bad news travels slowly: size, analyst coverage, and the profitability of momentum strategies. J Finance 55(1):265–295
Jegadeesh N, Titman S (1993) Returns to buying winners and selling losers: implications for stock market efficiency. J Finance 48(1):65–91
Jeong T, Kim K (2019) Effectiveness of FSCORE on the loser following online portfolio strategy in the Korean value stocks portfolio. Am J Theor Appl Bus 5(1):1–13
Kang J, Ding D (2006) Value and growth investing in Asian stock markets 1991–2002. Res Finance 22:113–139
Kim S, Lee C (2014) Implementability of trading strategies based on accounting information: Piotroski (2000) revisited. Eur Acc Rev 23(4):553–558
Kong D, Lin CP, Yeh IC, Chang W (2019) Building growth and value hybrid valuation model with errorsinvariables regression. Appl Econ Lett 26(5):370–386
Mehta N, Pothula VK, Bhattacharyya R (2019) A value investment strategy that combines security selection and market timing signals. SSRN 3451859
Mohanram S (2005) Separating winners from losers among low booktomarket stocks using financial statement analysis. Rev Account Stud 10(3):133–170
Montgomery DC (2012) Design and analysis of experiments. Wiley, New York, pp 611–622
Myers RH, Montgomery DC (2008) Response surface methodology. Wiley, New York
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Noma M (2010) Value investing and financial statement analysis. Hitotsubashi J Commer Manag 44(1):29–46
Piotroski JD (2000) Value investing: the use of historical financial statement information to separate winners from losers. J Account Res 38:1–41
Qian EE, Hua RH, Sorensen EH (2007) Quantitative equity portfolio management: modern techniques and applications. Chapman and Hall/CRC, Boca Raton
Rasekhschaffe KC, Jones RC (2019) Machine learning for stock selection. Financ Anal J 75(3):70–88
Richardson S, Tuna I, Wysocki P (2010) Accounting anomalies and fundamental analysis: a review of recent research advances. J Account Econ 50(2):410–454
Roko I, Gilli M (2008) Using economic and financial information for stock selection. CMS 5(4):317–335
Rosenberg B, Reid K, Lanstein R (1985) Persuasive evidence of market inefficiency. J Portf Manag 11(3):9–17
Shen KY, Tzeng GH (2015) Combined soft computing model for value stock selection based on fundamental analysis. Appl Soft Comput 37:142–155
Tikkanen J, Äijö J (2018) Does the Fscore improve the performance of different value investment strategies in Europe? J Asset Manag 19(7):495–506
Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and stock price crash risk: Evidence from China. Int Rev Financ Anal 65:101376
Wu X, Ye Q, Hong H, Li Y (2020) Stock selection model based on machine learning with wisdom of experts and crowds. IEEE Intell Syst 35(2):54–64
Yeh IC, Hsu TK (2011) Growth value twofactor model. J Asset Manag 11(6):435–451
Yeh IC, Hsu TK (2014) Exploring the dynamic model of the returns from value stocks and growth stocks using time series mining. Expert Syst Appl 41(17):7730–7743
Yeh IC, Lien CH, Ting TM (2015) Building multifactor stock selection models using balanced split regression trees with sorting normalization and hybrid variables. Int J Foresight Innov Policy 10(1):48–74
Yu H, Chen R, Zhang G (2014) A SVM stock selection model within PCA. Procedia Comput Sci 31:406–412
Funding
We do not receive any financial assistance from any agency.
Author information
Affiliations
Contributions
The first author conducted the project, wrote the paper and revised it. The second author checked writing and approved the final version. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yeh, IC., Liu, YC. Discovering optimal weights in weightedscoring stockpicking models: a mixture design approach. Financ Innov 6, 41 (2020). https://doi.org/10.1186/s4085402000209x
Received:
Accepted:
Published:
Keywords
 Portfolio optimization
 Stockpicking
 Weightedscoring
 Mixture experimental design
 Multivariable polynomial regression analysis