 Research
 Open access
 Published:
A factor score clustering approach to analyze the biopharmaceutical sector in the Chinese market during COVID19
Financial Innovation volume 10, Article number: 135 (2024)
Abstract
The biopharmaceutical sector is of considerable interest during the COVID19 pandemic. This study aims to investigate the biopharmaceutical sector using the Shenwan Industry Classification and provides insights into investment strategies. We combine factor and cluster analyses to reduce data dimensions and detect their latent similarities. Specifically, the biopharmaceutical sector is divided into six categories based on secondlevel industry classification. It is observed that medical devices, medical services, biological products, and chemical pharmaceuticals maintained their upward tendency, while Chinese medicine and pharmaceutical commerce declined slightly. We also develop optimal investment strategies using various metrics for different investor types.
Introduction
Since its initial announcement in early 2020, the coronavirus disease 2019 (COVID19) outbreak has precipitated recessions across major global economies such as China, the European Union, the United Kingdom, and the United States. This period also saw some countries experience significant political turmoil, contributing to considerable instability in financial markets. Notably, the global stock market has exhibited several phenomena indicative of these disruptions. AlAli (2020) identified the negative impact of COVID19 on abnormal market returns, highlighting marked differences in returns before and after the World Health Organization's declaration in major Asian stock markets, as determined through event study methodologies. Additionally, using a novel time series approach based on a Bayesian structure, Takyi and BentumEnnin (2021) estimated that the stock performance of 13 countries (Ghana, Nigeria, South Africa, Kenya, Tanzania, Tunisia, Mauritius, Morocco, Zambia, Namibia, Botwana, Cote D’Ivoire and Uganda) was rarely positively affected by COVID19. Specifically, since January 2020, the Chinese stock market has experienced significant downturns, with the Shanghai stock index dropping by 8.7%, reaching a low of 2646 in March.
Esparcia and López (2022) reported that COVID19 provided the biopharmaceutical industry with unprecedented revenue sources. The sector has been extensively analyzed in the literature (Robke et al. 2020; Ayati et al. 2020; Esparcia and López 2022; Ho et al. 2022). According to the Shenwan Industry Classification, China's biopharmaceutical sector includes medical services, medical devices, biological products, pharmaceutical commerce, Chinese medicine, and chemical pharmaceuticals. This raises critical questions regarding the uniformity of growth across these categories and their investment potential. This article focuses on two main inquiries: First, how have the six categories within China’s biopharmaceutical sector changed during the pandemic? We explore this using factor scores before and after the onset of pandemic. Second, which investment strategies are optimal for various investor profiles within this sector? We apply factor and cluster analyses to guide stock selection and offer investment portfolio recommendations based on standard investment metrics.
As financial markets have rapidly evolved in recent decades, multivariate analysis has become integral to managing large, highdimensional datasets in financial research (Wagenvoort et al. 2011; Vats and Samdani 2019; Seong and Nam 2021). This study combines factor analysis and clustering techniques for stock market analysis. As noted by Gorman and Primavera (1983), the primary purpose of factor analysis is to reduce the number of variables by grouping them into factors based on their correlations. Unlike factor analysis, cluster analysis can homogeneously group variables based on one or more multivariate similarity criteria. The goal is to segment data into clusters that reflect similarities and differences. The strength of factor analysis is the detection of a set of common and underlying dimensions of variables. However, factor analysis is unsuitable for investigating the latent similarities of data, which motivates us to consider using cluster analysis because the strength of cluster analysis is in identifying the different profiles or categories of the data and respondents. Based on the characteristic of the different techniques, we first consider using factor analysis to reduce the number of financial indicators to several components. Subsequently, we apply cluster analysis to homogeneously group the data according to the components.
To examine the impact of COVID19 on the biopharmaceutical sector, this study analyzes 165 listed Ashared companies in China according to Shenwan Industry Classification. These findings highlight several key trends. First, biopharmaceutical industry stocks maintained an upward momentum during the pandemic. Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals have increased, whereas Chinese medicine and pharmaceutical commerce have declined slightly. Second, based on the results of the cluster analysis, we allocate stock categories to different types of investors. Specifically, we classify investors based on their long and shortterm investments and degree of risk aversion. Finally, we translate the empirical results into economic gains, develop an optimal investment strategy based on the Sharpe ratio and mean–variance utility, and offer additional performance assessments using Treynor’s ratio and Jensen’s alpha. For shortterm investments, the results vary because the shortterm volatility of stocks is much higher, whereas for longterm investments, the results are similar.
This study contributes to the literature in three ways. First, it integrates factor and cluster analyses within the context of the Chinese financial market,—a combination seldom used in financial research. We illustrate the effectiveness of these combined techniques in analyzing the Chinese biopharmaceutical sector. Second, we contribute to secondlevel industry classification research. Most previous related literature focuses on firstlevel industry classification; in this article, we also discuss secondlevel industries (six categories) in the biopharmaceutical sector. In the context of the pandemic, investors are confident in the biopharmaceutical industries but not all categories have been boosted by COVID19. Therefore, research on secondlevel industry classifications is useful for practical investments. Finally, we extend studies on the effects of COVID19 on financial markets.
The remainder of this paper is organized as follows. In the next section, we review related literature and background. Section “Methodology” presents the methodologies used, including factor analysis and Kmeans clustering. Section “Empirical study” introduces the data and empirical research. We then provide an investment analysis in Section “Sector investment analysis” based on the categories derived in the previous section. Finally, we conclude the paper in Section “Conclusion”.
Literature review
Recent studies have advanced our understanding of stock market dynamics during the COVID19 pandemic. Cox et al. (2020) found that the Federal Reserve played a role in stock market fluctuations in the early weeks of the pandemic. Daglis et al. (2022) presented another interesting conclusion: a decreasing impact of COVID19 on the Italian stock market index and increasing volatility, both of which are statistically significant. Cevik et al. (2022) employed multiple analytical methods, including a panel regression with fixed effects, panel quantile regressions, a panel vector autoregression model, and countryspecific regressions, to explore the relationship between investor sentiments and stock market returns and volatility across 20 countries. They observed that rising positive investor sentiment boosts stock returns, whereas negative sentiment dampens returns at the lower quantiles. Esparcia and López (2022) developed a global and dynamic ratio to summarize different investor profiles according to their attitudes toward risk and to consider the dynamic nature of the economy and financial markets. They find that Principal Component Analysis enables the first principal component to summarize the information contained in the initial performance rankings. In the context of China, Liu et al. (2020) identified a downturn in Chinese stock markets since the onset of the pandemic, suggesting that stock price movements can serve as indicators of economic performance and future economic trends (Wagner 2020).
Global market sectors exhibited varying dynamics amid the pandemic. Gupta et al. (2022) reported significant adversity inflicted by COVID19 on India's manufacturing, agriculture, and service sectors. In contrast, He et al. (2020) observed a negative impact on China's transportation, mining, and utilities sectors, while noting resilience in manufacturing, IT, education, and healthcare. PiñeiroChousa et al. (2022) assessed the stock market responses of two pioneering US biopharmaceutical firms in mRNA vaccine development, highlighting the distinct volatility influences on Pfizer and Moderna’s returns before and during the pandemic. Zou and Wang (2023) revealed an undervaluation of the medical sector, pinpointing China's medical sector as being significantly valuable for its stable risk premium. In the realm of pharmaceuticals, Ho et al. (2022) employed the Fama–French fivefactor model to determine the influence of medical reform announcements and COVID19 vaccine approvals on Chinese pharmaceutical and healthcare company returns. Their findings indicated negative effects on younger and smaller firms due to medical reforms, whereas vaccine approvals generally boosted stock returns, except for smaller entities. Robke et al. (2020) reviewed the impact of the pandemic on pharmaceutical innovation investment and speculated on future changes in the innovationsourcing landscape. Ayati et al. (2020) delved into the pandemic’s short and longterm effects on the pharmaceutical sector, ranging from immediate demand shifts and regulatory changes to longerterm industry growth deceleration and supply chain shifts toward selfsufficiency. Despite the focus of existing literature on China’s biopharmaceutical sector, there is a gap in secondlevel industry classification studies. This study aims to address this gap by examining the transformations within the biopharmaceutical sector and its six subdivisions in China throughout the COVID19 timeline, culminating in tailored investment recommendations based on an innovative factorscore clustering method.
Factor analysis is a statistical tool used to identify a relatively small number of factors that represent the relationships between many interrelated variables. Early studies have used this tool for financial analyses (Morelli 1999; Jones 2006; Bai and Ng 2006; Ludvigson and Ng 2007). Wagenvoort et al. (2011) used factor analysis to reveal that the rates on large and small loans with long fixation periods converged weakly. Through factor analysis, they introduced a new measure to reassess whether retail bank market integration must be present, ongoing, or complete. Few studies have used factor analysis to score stocks; hence, little investment advice is provided. We provide portfolio suggestions by ranking stocks according to their factor scores. Clustering is an unsupervised technique that is used to generate groups of similar objects. Kmeans clustering has been used in several financial analyses. For instance. Nanda et al. (2010) compared several clustering methods, including Kmeans, selforganizing maps (SOM), and Fuzzy Cmeans, to perform stock classification in India. The results showed that Kmeans clustering can help to build the most compact clusters. Seong and Nam (2021) used Kmeans clustering and multiple kernel learning techniques to predict stock price movements. Expanding on these methodologies, our study employs Kmeans clustering to classify 165 Chinese listed companies into distinct categories, from which we devise corresponding investment strategies. Notably, this study merges factor and cluster analyses, which is a rare approach in financial studies, to enhance investment decisionmaking.
The impetus for our research emerges from the scant literature on secondlevel industry classifications. Few studies have employed both factor and cluster analyses in tandem. Our study unites these methodologies to reduce the variables and categorizes them based on their intrinsic traits.
Methodology
Factor analysis
The factor analysis method (Kim et al. 1978) recombines the original multidimensional variable indices to identify common factors, namely, the main factors. This reflects the primary statistical information of the multidimensional index used to achieve the dimension reduction. Factor analysis can be used to analyze multiple statistical variables. After dimension reduction, the main factors were the primary information of the original variables, making the research process simple, effective, and objective. In this study, each stock had 12 financial indicators with obvious multicollinearity (see Fig. 1). To reduce them to fewer variables, we adopted factor analysis to extract the common factors. The main steps of the factor analysis are as follows:
Main Steps:

Standardization of data and applicability test
First we must deal with the indicators through standardization. Here, X_{ij} is the ith financial indicator in year j, and X_{i} represents the mean of ith indicator, and S_{j} represents the standard error of ith indicator. Z_{ij} represents the variable after standardization, which follows a standard normal distribution.
$${\text{Z}}_{\text{ij}}=\frac{{\text{X}}_{\text{ij}}{\text{X}}_{\text{i}}}{{\text{S}}_{\text{j}}} \left({Z}_{ij} \sim {\text{N}}\left(\text{0,1}\right)\right)$$ 
Factor extraction and naming
Common factors were extracted in alignment with the methodology of Hao et al. (2019) using the cumulative contribution rate. To minimize data loss from common factors and enhance factor analysis utility, factors were chosen when their cumulative variance contribution rates exceeded 80%. Following the factor extraction predicated on the eigenvalues, we computed the variance contribution rate and the corresponding cumulative rates.

Factor scores and composite scores calculation
The score for each factor was based on the factor coefficient and standardized variables. The common factors were calculated as follows:
$$Fi = \beta i1X1 \, + \, \beta i2X2 \, + \, \beta i3X3 \, + \cdots + \, \beta inXn$$where β_{ip} (i = 1,2…n) is the score of factor F_{i} on variable X_{p}.
The overall scores can be computed by multiplying the score of each main factor by the contribution rate as follow:
$$F = \alpha_{1} F_{1} + \, \alpha_{2} F_{2} + \, \alpha_{3} F_{3} + \cdots + \, \alpha_{m} F_{m}$$where F_{i} (i = 1,2…m) is the score of each factor, and α_{i} (i = 1,2…m) is the contribution rate of each factor. Factor analysis can efficiently deal with several intercorrelated variables and identify common factors that contain most of the information in the data.
Cluster analysis (Kmeans)
Clustering is a datamining technique that divides a dataset into multiple categories by calculating the similarity between the data. Kmeans clustering (Hartigan and Wong 1979) is a technique in which data are divided into preset K categories, making the data characteristics in the same category more similar. The cluster centers were iteratively updated to optimize the results. We adopted the Kmeans clustering to classify 165 stocks into several categories, and the steps are as follows (Seong and Nam 2021).
Main steps:

Step 1: Select K = 7 initial centers for classification. (K = 7 is the value determined from the empirical results.)

Step 2: Calculate the distances between each point and the centers.

Step 3: Classify the points according to distance using the Euclidean distance metric.

Step 4: Update the centers by calculating the centroid of different categories.

Step 5: Repeat Steps 2, 3, and 4 until the data points around each center remain constant.
In addition to its fast convergence speed and excellent clustering performance, Kmeans clustering requires oneparameter tuning (K) and is more explicable than other clustering techniques. Our methodology steps are plotted in the flowchart shown in Fig. 1.
Empirical study
Data description
We employed factor and cluster analyses in our empirical study. We chose companies in the biopharmaceutical sector according to the latest 2021 Shenwan Industry Classification. Six categories were selected, based on the secondlevel industry classification of the Shenwan biopharmaceutical sector. We selected annual data from the audited financial statements of six categories of listed companies in the biopharmaceutical industry from 2018 to 2021. All data were obtained from the Wind Database. We eliminated companies with incomplete data, leaving 165 listed companies for the empirical analysis. Information on the 165 listed companies is presented in Table 1. Data preprocessing included truncation and cleansing to derive 12 evaluative indicators selected for their comprehensiveness and relevance. These indicators were chosen to maintain the objectivity, accuracy, and referential integrity of the assessments, ensuring that they encapsulated the development trajectory of the companies’ stocks. Following Wang and Lee (2008), who advocated clustering based on financial ratio variability, the chosen indicators and their summary statistics are detailed in Tables 1 and 2.
Factor analysis
The data were first analyzed using SPSS software to measure the company’s financial performance at different levels. The correlation heat maps of the economic indicators for 2018 to 2021 are in Fig. 2. The tables of correlation coefficients of the financial indicators for 2018, 2019, 2020, and 2021 are reported in Annexed Tables 23, 24, 25, 26 in the Appendix.
Common factor formation
According to the requirements of the factor analysis method for the correlation degree of variables, we adopted Bartlett’s spherical test and the Kaiser–Meyer–Olkin (KMO) test to verify the suitability of the factor analysis. Bartlett’s spherical test was used to compare the correlation matrix of the data with the identity matrix. A significant Bartlett’s test (pvalue below 0.05) indicated the suitability of the data for factor analysis. The KMO test was used to compare the simple and partial correlation coefficients. If the KMO value approaches 1, the correlation among the variables is significant, and factor analysis is suitable for the variables. The results obtained after processing using SPSS are presented in Table 3.
From the test results above, all KMO values were larger than 0.6, which means that there were more similar factors among the variables; therefore, the selected variables were suitable for the molecular factor test. The pvalues of Bartlett’s spherical test were all 0.00, below the threshold of 0.05, signifying robust interrelations among the financial indicators and endorsing their suitability for factor analysis.
After extracting common factors from the 12 financial indicators, the total variance explained degree, gravel plot, common factor variance results, and component score coefficient matrix were obtained by analyzing the relationships among the financial indicators and revealing the primary information contained in the common factors.
As the loadings of the common factors F_{1}, F_{2}, F_{3}, and F_{4} on some of the initial variables did not differ significantly, an explanatory relationship between the variables and the common factors could not be observed; therefore, rotation of the component matrix was required.
From the results in Table 4, four common factors were extracted from the 12 selected indicators. The cumulative variance contribution of F_{1}, F_{2}, F_{3} and F_{4} were 73.172%, 74.857%, 72.463%, and 77.789% respectively for 2018–2021. Analysis of the gravel plot in Fig. 2 also shows that the inflection point occured at the fourth root of the feature; therefore, the first four factors could be retained.
Since we aimed to examine the attribution of each variable, the original component matrix was rotated for ease of naming. This rotation ensures that each variable has a more extensive loading on one common factor and a smaller loading on the remaining common factors, and the results are shown in Table 5.
The rotated factorloading matrix is presented in Table 5. In the first common factor, diluted earnings per share, basic earnings per share, and net assets per share have more significant loadings, indicating that these three indicators are more correlated with each other and reveal the enterprise’s current profits, losses, and future earnings expectations. Common factor F_{1} is the operating income factor. The second common factor, net cash flow per share, operating profit growth, total assets growth rate, and growth rate of shareholders’ equity, have extensive loadings, indicating that these four indicators are highly correlated and reveal the company’s development capability in 2021. Common factor F_{2} is the development potential factor. In the third common factor, equity multiplier and equity ratio have high loadings, indicating that these two indicators are strongly correlated and reveal a company’s assets and equity. Common factor F_{3} is the asset structure factor. Finally, the fourth common factor, with substantial loadings on the current ratio, quick ratio, and proportion of current assets to total assets, indicates a robust correlation among indicators that reflect a firm's shortterm liquidity and is aptly labeled the “Solvency Factor.”
Composite score of the company
A matrix of component score coefficients was obtained using SPSS (Table 6).
The standardized values of the initial indicators were substituted into the factor score function to calculate the factor scores of each sample, and a further comprehensive evaluation of the observed indicators was performed. A complete evaluation model was established using the variance contribution of the four common factors extracted as weights, combined with each factor score.
The composite score for each firm was calculated, and the scores were ranked separately in descending order, with only the top 20 stocks with positive composite scores shown in this paper, as shown in Table 7.
The literature review, including studies by He et al. (2020) and Gupta et al. (2022), indicated that industry firms broadly confronted the adverse consequences of the COVID19 pandemic, leading to economic downturns. From Tables 7 and 8, the composite scores of the top three leading enterprises uniformly show upward trends, with the top four enterprises’ composite scores being 2.04, 1.84, 1.8, and 1.44, respectively, in 2018 before the outbreak, and 2.45, 1.85, 1.73, and 1.64 in 2021 after the outbreak. The positive impact of the pandemic was offset by the adverse effects of the economic environment, resulting in a slight increase in the overall scores of the leading companies. However, owing to the weaker capability of small medium enterprises to withstand economic downturns, even though the pandemic boosted their growth to a certain extent, the overall score still shows a slight decline. This result was consistent with the findings of Thukral (2021).
In 2021, the composite scores of the top seven companies rose by approximately 10% compared to 2020, reflecting sustained governmental support for biotechnological innovation and bioindustry development, coupled with increased consumer demand for medical supplies like alcohol and masks. This support has propelled leading biopharmaceutical stocks upward during the pandemic.
The classification and indicator score ranking of stocks in 2020 and 2021 have undergone a significant reshuffle compared to 2018 and 2019. Therefore, the specific development of the different types of pharmaceutical stocks must be discussed further. Table 9 presents the scores of the top 20 stocks in 2021 over the last 4 years.
The results in this table will be used to calculate the stock score in Tables 7 and 8.
The top 20 stock categories in the composite score for 2021 are more evenly dispersed, with biopharmaceutical companies in all six categories. Of these, 80% of the companies in the four categories—medical devices, medical services, biological products, and chemical pharmaceuticals—showed an upward trend in their composite scores after the outbreak. As the response to the pandemic necessitated medical personnel and equipment support, the medical market experienced a significant boost in capital inflows. Additionally, advancements in vaccine research and development prompted a reevaluation of the biopharmaceutical industry's significant growth potential. Several stocks exhibited multiple upward movements. For companies in the two categories of Chinese medicine and pharmaceutical commerce, the composite score tended to decline slightly or remained stable after the outbreak. The increased popularity of disinfection products postpandemic led to a decline in the incidence of other infectious diseases such as influenza, resulting in a reverse impact on commercial businesses and Chinese medicine.
Cluster analysis
This study used the Kmeans clustering method to cluster the stocks in the sample and refine the similarities among them, analyzing the characteristics and commonalities of the 165 listed stocks. From the gravel plot (Fig. 3), it can be seen that the curve remains flat at K = 7, while it suddenly increases at K = 8; therefore, K is taken as 7, that is, the sample is divided into seven categories. We also show the results of the clustering data when the number of clusters is six and seven in Annexed Tables 27 and 28 in the Appendix.
Based on the scores derived from the factor analysis, the stocks in each of the seven categories in 2021 are ranked in descending order of total scores. The top ten scoring stocks were screened and analyzed for their four public factor scores, as shown in Tables 10, 11, 12, 13, 14, 15, 16. Baker and Haslem (1974) divided investors into two types according to a decisionorientation criterion, which refers to investors’ confidence in their decisionmaking abilities. This study divided investors into confidence groups based on these criteria.
In the first category, the stocks score exceptionally well on the operating income factor, while the other types perform at average levels. Operating earnings represent the existing earnings of a business, suggesting that this category is suitable for investors aiming for shortterm profits. However, due to their moderate growth potential, longterm investments are less advisable. This category is recommended for shortterm holdings, particularly for investors with a lower risk tolerance.
In the second category, the stocks are more prominent in the solvency factor score. The scores of the other three factors are relatively stable, and the stock types are concentrated in biopharmaceuticals. The outbreak of the pandemic has led to a rapid and steady growth in new product R&D expenditures, and the scale of new product production and sales of large and mediumsized enterprises in China’s biopharmaceutical industry is on a faster growth trend. Therefore, this category represents an ideal option for investors seeking medium to longterm holdings, particularly for those with lower risk tolerance.
In the third category, the stocks have stable scores in operating income, solvency, and development potential, reaching a positive value of approximately 60%. However, the asset structure factor scores are all negative. This finding indicates that the pandemic did not significantly affect the development of these Ashare listed companies. There is a commonality of weak stability in funding sources, making it challenging to achieve high shortterm returns on stocks. It is suitable for cautious investors to hold in the medium or longterm.
In the fourth category, the stocks have positive scores for operating income and development potential factors and negative scores for asset structure and solvency factors. Operating income represents the existing earnings of a business, suggesting that this category is suitable for confident investors seeking shortterm profitability. It is recommended that investors hold forward contracts for this category of stocks, combined with the excellent performance of their growth potential.
In the fifth category, this group of stocks score particularly well on the development potential factor, whereas the other factors are relatively flat. Owing to the long duration of the impact of the coronavirus pandemic on China’s biopharmaceutical industry, the potential of this category is stable in the long term. This is suitable for longterm holdings of cautious investors.
In the sixth category, the stocks do not score well on the operating income, solvency, asset structure, and development potential factors; all are relatively “medium.” The distribution of stocks in this category is relatively even, and the risk profile is medium. Although there are some variations in share prices, stocks appear to be suitable for cautious investors.
In the seventh category, stocks stand out, with positive scores on the asset structure factor, whereas the other three factors have relatively flat scores. The leading asset structure indicator indicates the relative relationship between the funding sources provided by creditors and those offered by investors, reflecting the stability of the underlying financial structure of the business. Hence, stocks in this category are suitable for confident investors with a shortterm horizon.
Sector investment analysis
We have offered stockselection guidance for various investor profiles. However, translating these empirical findings into tangible economic benefits requires specific investment portfolio recommendations. Moving forward, we aim to devise optimized strategies for both long and shortterm investments, integrating the insights from Section “3.3” with returnrisk analyses. In this study, we address the optimization challenge by employing Markowitz's model, using the Sharpe ratio (Sharpe 1998) as a performance metric. Markowitz’s model is designed to assist rational investors in maximizing returns for a given level of risk, or minimizing risk for a specified return level.
Long and short term portfolio
Suppose that investment P has a set of N variable assets in the market. Let \({r}_{p}\) be the expected rates of return and \({\updelta }_{p}\) be the risks. We will have expected return \({r}_{p}={\sum }_{i=1}^{N}{w}_{i}{r}_{i}\), where w is the weight factor with values between 0 and 1, and the variance \({\updelta }_{p}^{2}\left(w\right)={\sum }_{i=1}^{N}{\sum }_{j=1}^{N}{w}_{i}{w}_{j}{\updelta }_{ij}={w}^{T}\Delta w\), where \({w}_{i}\) and \({w}_{j}\) are the weights assigned to stock i and j respectively; ∆ is the covariance matrix of the stocks, and δ_{ij} is the covariance between the stock price of i and j. Then we define:
where r_{f} is the riskfree interest rate. We choose benchmark 1year and 5year deposit rates as proxies. The optimization problem can be described as follows:
We consider a mean–variance investor and define the quadratic utility function for investing in this portfolio (Wang et al. 2016) as:
where U is the mean–variance utility, \(E\left({r}_{p}\right)\) is the portfolio’s mean return, and \(var\left({r}_{p}\right)\) is the portfolio’s variance, which is a proxy for the portfolio risk. γ is the risk aversion coefficient, and we set γ to be 3 for the shortterm investment and 6 for the longterm investment.
For longterm investment, we selected stocks from Categories 3 and 5, which include five stocks. Due to extensive missing data, stock 832,735, BJ (stocks from Beijing Exchange) was excluded from the portfolio analysis. We ran the optimization on the remaining four stocks: 300,573.SZ, 000538.SZ, 300,760.SZ, and 603,127. SH. We set r_{f} as the benchmark fiveyear deposit rate for 2021 (2.75%). Optimization simulations used stock prices from the start of 2021 to the end of 2021. R programming was used to perform optimization according to the above formula. Figure 4a shows the Sharpe ratio curve of this portfolio against the expected return with a minimum α. For demonstration, we display the nearest ten results around the maximum point. As shown in Table 17, 0.03452 had the largest Sharpe ratio, and the weights of 000523. SZ and 300,760. The SZ in this portfolio is 0 with a weight of 300,573.SZ and 603,127.SH were 24.8% and 75.2%, respectively, with an expected return of 7.179% and a standard deviation of 1.283. Figure 4a shows the Sharpe ratio curve against the expected return with a minimum α. Table 17 also reports the mean–variance utility values for each portfolio. An optimal value of 4.7134 was achieved when the weights of 300,573.SZ and 603,127.SH were 26.99% and 73.01%, respectively, with an expected return of 7.148% and standard deviation of 1.274.
For shortterm investment, we selected the best stock from Categories 1, 4, and 7, which comprise 12 stocks in total. In this situation, we set α to be 1.75%, the oneyear riskfree interest rate in 2021. Similarly, Fig. 4b shows the Sharpe ratio curve for the 12 stocks against the expected returns. Similar to Table 17, the optimization results with the nearest ten points are listed in Table 18. This table excludes the weights of 688,399.SH, 601,607.SH, 000661.SZ, 603,368.SH, 600,829.SH, 002524.SZ, 000028.SZ, 000411.SZ, and 002462.SZ because their weights are approximately 0.00% when a minimum of 0.0175 expected return is required. The remaining three stocks are 002821.SZ, 600,713.SH, and 600,129.SH. From Table 18, 0.05123 is the largest Sharpe ratio, and for this investment, the weights of 600,173.SH and 601,607.SH were both 0 and the weights of 002821.SZ and 600,129.SH were 23.88% and 76.12%, respectively, with an expected return of 8.133% and a standard deviation of 1.051. Table 18 also reports the mean–variance utility values for each portfolio. The optimal value of 4.8425 was achieved when the weights of 002821.SZ and 600,129.SH were 28.57% and 71.43%, respectively, with an expected return of 8.019% and standard deviation of 1.029.
To provide investment suggestions from another riskreturn perspective, we calculated the portfolios (which have returns above the riskfree rate), Treynor Ratio (TR), and Jensen Alpha (JA); the optimal investment strategy using these two performance evaluation metrics is shown in Tables 19 and 20. The Treynor Ratio measures the risk premium earned per unit of systematic risk, whereas Jensen Alpha assesses investment performance by quantifying the deviation of a portfolio’s average return from its expected return, based on the Capital Asset Pricing Model. The formulae for these performance metrics are as follows:
where \({\upbeta }_{i}\) is the beta of the holding and r_{m} is the average expected return of the market. From Table 19, the shortterm optimal portfolios determined by the Treynor Ratio and Jensen Alpha differ considerably from those based on the Sharpe Ratio and mean–variance utility from Table 18. In Table 20, the longterm optimal portfolios determined by the Treynor Ratio and Jensen Alpha are consistent with the results in Table 17. In summary, the optimal longterm portfolio decisions do not vary significantly if different metrics are applied, whereas the optimal shortterm portfolio can be completely diverse based on various portfolio profitability evaluation methods.
Robustness check
In addition to constructing long and shortterm portfolios, we conducted a robustness check by comparing our results in Section “Long and short term portfolio” and with traditional equally weighted portfolios. Specifically, for the shortterm portfolio, we partitioned 2021 into four quarters and compared them for each quarter. Tables 21 and 22 display the robustness check results for the long and shortterm portfolios, respectively. From the tables, our results from Section “Long and short term portfolio” are clearly better than those from the equally weighted portfolio (EWP). Although EWP had a higher Sharpe ratio, our results lead to higher portfolio returns and utility.
Conclusion
This study employed factor and cluster analyses to examine 165 listed companies from 2018 to 2021. Based on these results, we assessed the overall performance of various biopharmaceutical sector stocks, estimated the impact of COVID19 on different stock types, and offer recommendations to investors.
The main results of this study are as follows. First, the biopharmaceutical industry stocks maintained an upward momentum during the pandemic. Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals increased, whereas Chinese medicine and pharmaceutical commerce declined. Unlike previous studies, we explored the dynamics of secondlevel classification stocks during the COVID19 pandemic. Second, regarding the clustering results, we conclude: Category 1 stocks are suitable for shortterm holdings by unconfident investors. Categories 2 and 3 stocks are medium to longterm holdings for cautious investors. The fourth category is ideal for confident investors investing in forward contracts. Category 5 is suitable for longterm holdings by cautious investors. Category 6 is suitable for cautious stockholders. Category 7 is ideal for shortterm holdings by confident investors. We considered both the degree of risk aversion and holding period of stock investors. Finally, we developed an optimal investment strategy using the Sharpe ratio and mean–variance utility, and provided alternative performance metrics, including the Treynor Ratio and Jensen Alpha, for comparison. For longterm investments, we recommend that investors allocate 24.8% of their wealth in 300,573.SZ and 75.2% in 603,127.SH to achieve the highest Sharpe ratio. We suggest a portfolio allocation of 26.99% in 300,573.SZ and 73.01% in 603,127.SH for the best mean–variance utility. For shortterm investments, we recommend a portfolio allocation of 23.88% in 600,173.SH and 76.12% in 601,607.SH for the highest Sharpe ratio, and 28.57% in 600,173.SH and 71.43% in 601,607.SH for the best mean–variance utility. We provide investment suggestions tailored to investors with different preferences for portfolio analysis metrics.
This study had some limitations. For example, this study focused only on the stock market during the pandemic. Other financial assets such as bonds and futures were not included. Second, the analysis was restricted to data from China, despite the global impact of COVID19, suggesting the potential relevance of examining financial markets in other countries.
Future research could broaden the scope to include additional financial assets as the pandemic progresses in China. In addition, given the significant effects of the pandemic on countries such as the US and India, a comparative study of their biopharmaceutical sectors relative to China's may yield valuable insights. Furthermore, advances in research methodologies may allow the application of sophisticated machine learning and deep learning techniques, such as DBN (Deep Belief Network), LSTM (Long ShortTerm Memory), and support vector machines, to explore nonlinear relationships in the Chinese stock market. While some studies, such as Leippold et al. (2022), have applied nonlinear machine learning methods to the Chinese stock market, few have specifically addressed nonlinear dynamics within sectors such as the biopharmaceutical industry.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in Wind Database.
Abbreviations
 COVID19:

Coronavirus disease 2019
 SOM:

Self organizing maps
 KMO test:

Kaiser–Meyer–Olkin test
 R&D expenditure:

Research and development expenditure
 xxxxxx.SH:

Stock from Shanghai Exchange
 xxxxxx.SZ:

Stock from Shenzhen Exchange
 TR:

Treynor ratio
 JA:

Jensen Alpha
 EWP:

Equally weighted portfolio
 ER:

Expected return
 SD:

Standard deviation
 DBN:

Deep belief network
 LSTM:

Long shortterm memory
References
AlAli MS (2020) The effect of who COVID19 announcement on Asian Stock Markets returns: an event study analysis. J Econ Business 3(3):1051–1054
Ayati N, Saiyarsarai P, Nikfar S (2020) Short and long term impacts of COVID19 on the pharmaceutical sector. DARU J Pharm Sci 28:799–805
Baker HK, Haslem JA (1974) The impact of investor socioeconomic characteristics on risk and return preferences. J Bus Res 2(4):469–476
Bai J, Ng S (2006) Evaluating latent and observed factors in macroeconomics and finance. J Econ 131(1–2):507–537
Cevik E, Kirci Altinkeski B, Cevik EI, Dibooglu S (2022) Investor sentiments and stock markets during the COVID19 pandemic. Financ Innov 8(1):69
Cox J, Greenwald DL, Ludvigson SC (2020) What explains the COVID19 stock market? (No. w27784). National Bureau of Economic Research
Daglis T, Melissaropoulos IG, Konstantakis KN, Michaelides PG (2022) The impact of COVID19 on global stock markets: early linear and nonlinear evidence for Italy. Evol Inst Econ Rev 19(1):485–495
Esparcia C, López R (2022) Outperformance of the pharmaceutical sector during the COVID19 pandemic: global timevarying screening rule development. Inf Sci 609:1181–1203
Gorman BS, Primavera LH (1983) The complementary use of cluster and factor analysis methods. J Exp Educ 51(4):165–168
Gupta V, Santosh KC, Arora R, Ciano T, Kalid KS, Mohan S (2022) Socioeconomic impact due to COVID19: an empirical assessment. Inf Process Manag 59(2):102810
Hao Y, Liu H, Chen H, Sha Y, Ji H, Fan J (2019) What affect consumers’ willingness to pay for green packaging? Evidence from China. Resour Conserv Recycl 141:21–29
Hartigan JA, Wong MA (1979) Algorithm AS 136: a kmeans clustering algorithm. J R Stat Soc Ser C (appl Stat) 28(1):100–108
He P, Sun Y, Zhang Y, Li T (2020) COVID–19’s impact on stock prices across different sectors—an event study based on the Chinese stock market. Emerg Mark Financ Trade 56(10):2198–2212
Ho KC, Chen C, Yang D, Gao Y (2022) Medical reform, Covid19 vaccine and stock returns: the case of Chinese listed pharmaceutical and healthcare companies. Appl Econ Lett 31:832–839
Jones CS (2006) A nonlinear factor analysis of S&P 500 index option returns. J Financ 61(5):2325–2363
Kim JO, Mueller CW (1978) Introduction to factor analysis: What it is and how to do it, vol 13. Sage, England
Leippold M, Wang Q, Zhou W (2022) Machine learning in the Chinese stock market. J Financ Econ 145(2):64–82
Liu H, Wang Y, He D, Wang C (2020) Short term response of Chinese stock markets to the outbreak of COVID19. Appl Econ 52(53):5859–5872
Ludvigson SC, Ng S (2007) The empirical risk–return relation: a factor analysis approach. J Financ Econ 83(1):171–222
Morelli D (1999) Tests of structural change using factor analysis in equity returns. Appl Econ Lett 6(4):203–207
Nanda SR, Mahanty B, Tiwari MK (2010) Clustering Indian stock market data for portfolio management. Expert Syst Appl 37(12):8793–8798
PiñeiroChousa J, LópezCabarcos MÁ, QuiñoáPiñeiro L, PérezPico AM (2022) US biopharmaceutical companies’ stock market reaction to the COVID19 pandemic. Understanding the concept of the ‘paradoxical spiral’ from a sustainability perspective. Technol Forecast Soc Chang 175:121365
Robke L, Pont LB, Bongard J, Wurzer S, Smietana K, Moss R (2020) Impact of COVID19 on pharmaceutical external innovation sourcing. Nat Rev Drug Discov 19(12):829–830
Seong N, Nam K (2021) Predicting stock movements based on financial news with segmentation. Expert Syst Appl 164:113988
Sharpe WF (1998) The sharpe ratio. Streetwise Best J Portfolio Manag 3:169–185
Takyi PO, BentumEnnin I (2021) The impact of COVID19 on stock market performance in Africa: a Bayesian structural time series approach. J Econ Bus 115:105968
Thukral E (2021) COVID19: small and medium enterprises challenges and responses with creativity, innovation, and entrepreneurship. Strateg Chang 30(2):153–158
Vats P, Samdani K (2019) Study on machine learning techniques in financial markets. In: 2019 IEEE international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–5
Wagenvoort RJ, Ebner A, Borys MM (2011) A factor analysis approach to measuring European loan and bond market integration. J Bank Financ 35(4):1011–1025
Wagner AF (2020) What the stock market tells us about the postCOVID19 world. Nat Hum Behav 4(5):440–440
Wang Y, Ma F, Wei Y, Wu C (2016) Forecasting realized volatility in a changing world: a dynamic model averaging approach. J Bank Financ 64:136–149
Wang YJ, Lee HS (2008) A clustering method to identify representative financial ratios. Inf Sci 178(4):1087–1097
Zou Z, Wang X (2023) Research on the investment value of China’s medical sector in the context of COVID19. Econ ResEkonomska Istraživanja 36(1):614–633
Funding
XJTLU Postgraduate Research ScholarshipPGRS2012016.
Author information
Authors and Affiliations
Contributions
Study conception and design: Conghua wen; data collection: Yifang Tang, Feifan Zhao; analysis and interpretation of results: Jiahui Xi, Conghua Wen; draft manuscript preparation: Jiahui Xi, Yifan Tang, Conghua Wen. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
No competing interests exist for the publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Correlation matrix of financial indicators
In this section, we show the correlation coefficient matrix of financial indicators in 2018, 2019, 2020 and 2021.
.
Appendix B: Determination of the number of clusters
When performing cluster analysis, parameter calls are first taken to determine the number of clusters. The inflection point is often used as the basis for the selection of the number of clusters, and as the number of clusters K increases, the sample is divided more finely and the degree of aggregation of each cluster is gradually increased.
Taking K less than 4, the classification is relatively coarse and less referential. Taking K = 6, 141 out of 165 stocks belong to the same category and there are four subgroups with only a small number of stocks. This classification method leads to less referential results. Therefore K = 7 is taken.
The tables below show the clustering data when the number of clusters is 6 and 7.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xi, J., Wen, C., Tang, Y. et al. A factor score clustering approach to analyze the biopharmaceutical sector in the Chinese market during COVID19. Financ Innov 10, 135 (2024). https://doi.org/10.1186/s4085402400654y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4085402400654y