A factor score clustering approach to analyze the biopharmaceutical sector in the Chinese market during COVID-19

The biopharmaceutical sector is of considerable interest during the COVID-19 pandemic. This study aims to investigate the biopharmaceutical sector using the Shenwan Industry Classification and provides insights into investment strategies. We combine factor and cluster analyses to reduce data dimensions and detect their latent similarities. Specifically, the biopharmaceutical sector is divided into six categories based on second-level industry classification. It is observed that medical devices, medical services, biological products, and chemical pharmaceuticals maintained their upward tendency, while Chinese medicine and pharmaceutical commerce declined slightly. We also develop optimal investment strategies using various metrics for different investor types.


Introduction
Since its initial announcement in early 2020, the coronavirus disease 2019 (COVID-19) outbreak has precipitated recessions across major global economies such as China, the European Union, the United Kingdom, and the United States.This period also saw some countries experience significant political turmoil, contributing to considerable instability in financial markets.Notably, the global stock market has exhibited several phenomena indicative of these disruptions.AlAli (2020) identified the negative impact of COVID-19 on abnormal market returns, highlighting marked differences in returns before and after the World Health Organization's declaration in major Asian stock markets, as determined through event study methodologies.Additionally, using a novel time series approach based on a Bayesian structure, Takyi and Bentum-Ennin (2021) estimated that the stock performance of 13 countries (Ghana, Nigeria, South Africa, Kenya, Tanzania, Tunisia, Mauritius, Morocco, Zambia, Namibia, Botwana, Cote D'Ivoire and Uganda) was rarely positively affected by COVID-19.Specifically, since January 2020, the Chinese stock market has experienced significant downturns, with the Shanghai stock index dropping by 8.7%, reaching a low of 2646 in March.Esparcia and López (2022) reported that COVID-19 provided the biopharmaceutical industry with unprecedented revenue sources.The sector has been extensively analyzed in the literature (Robke et al. 2020;Ayati et al. 2020;Esparcia and López 2022;Ho et al. 2022).According to the Shenwan Industry Classification, China's biopharmaceutical sector includes medical services, medical devices, biological products, pharmaceutical commerce, Chinese medicine, and chemical pharmaceuticals.This raises critical questions regarding the uniformity of growth across these categories and their investment potential.This article focuses on two main inquiries: First, how have the six categories within China's biopharmaceutical sector changed during the pandemic?We explore this using factor scores before and after the onset of pandemic.Second, which investment strategies are optimal for various investor profiles within this sector?We apply factor and cluster analyses to guide stock selection and offer investment portfolio recommendations based on standard investment metrics.
As financial markets have rapidly evolved in recent decades, multivariate analysis has become integral to managing large, high-dimensional datasets in financial research (Wagenvoort et al. 2011;Vats and Samdani 2019;Seong and Nam 2021).This study combines factor analysis and clustering techniques for stock market analysis.As noted by Gorman and Primavera (1983), the primary purpose of factor analysis is to reduce the number of variables by grouping them into factors based on their correlations.Unlike factor analysis, cluster analysis can homogeneously group variables based on one or more multivariate similarity criteria.The goal is to segment data into clusters that reflect similarities and differences.The strength of factor analysis is the detection of a set of common and underlying dimensions of variables.However, factor analysis is unsuitable for investigating the latent similarities of data, which motivates us to consider using cluster analysis because the strength of cluster analysis is in identifying the different profiles or categories of the data and respondents.Based on the characteristic of the different techniques, we first consider using factor analysis to reduce the number of financial indicators to several components.Subsequently, we apply cluster analysis to homogeneously group the data according to the components.
To examine the impact of COVID-19 on the biopharmaceutical sector, this study analyzes 165 listed A-shared companies in China according to Shenwan Industry Classification.These findings highlight several key trends.First, biopharmaceutical industry stocks maintained an upward momentum during the pandemic.Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals have increased, whereas Chinese medicine and pharmaceutical commerce have declined slightly.Second, based on the results of the cluster analysis, we allocate stock categories to different types of investors.Specifically, we classify investors based on their long-and short-term investments and degree of risk aversion.Finally, we translate the empirical results into economic gains, develop an optimal investment strategy based on the Sharpe ratio and mean-variance utility, and offer additional performance assessments using Treynor's ratio and Jensen's alpha.For short-term investments, the results vary because the short-term volatility of stocks is much higher, whereas for long-term investments, the results are similar.
This study contributes to the literature in three ways.First, it integrates factor and cluster analyses within the context of the Chinese financial market,-a combination seldom used in financial research.We illustrate the effectiveness of these combined techniques in analyzing the Chinese biopharmaceutical sector.Second, we contribute to second-level industry classification research.Most previous related literature focuses on first-level industry classification; in this article, we also discuss second-level industries (six categories) in the biopharmaceutical sector.In the context of the pandemic, investors are confident in the biopharmaceutical industries but not all categories have been boosted by COVID-19.Therefore, research on second-level industry classifications is useful for practical investments.Finally, we extend studies on the effects of COVID-19 on financial markets.
The remainder of this paper is organized as follows.In the next section, we review related literature and background.Section "Methodology" presents the methodologies used, including factor analysis and K-means clustering.Section "Empirical study" introduces the data and empirical research.We then provide an investment analysis in Section "Sector investment analysis" based on the categories derived in the previous section.Finally, we conclude the paper in Section "Conclusion".

Literature review
Recent studies have advanced our understanding of stock market dynamics during the COVID-19 pandemic.Cox et al. (2020) found that the Federal Reserve played a role in stock market fluctuations in the early weeks of the pandemic.Daglis et al. (2022) presented another interesting conclusion: a decreasing impact of COVID-19 on the Italian stock market index and increasing volatility, both of which are statistically significant.Cevik et al. (2022) employed multiple analytical methods, including a panel regression with fixed effects, panel quantile regressions, a panel vector autoregression model, and country-specific regressions, to explore the relationship between investor sentiments and stock market returns and volatility across 20 countries.They observed that rising positive investor sentiment boosts stock returns, whereas negative sentiment dampens returns at the lower quantiles.Esparcia and López (2022) developed a global and dynamic ratio to summarize different investor profiles according to their attitudes toward risk and to consider the dynamic nature of the economy and financial markets.They find that Principal Component Analysis enables the first principal component to summarize the information contained in the initial performance rankings.In the context of China, Liu et al. (2020) identified a downturn in Chinese stock markets since the onset of the pandemic, suggesting that stock price movements can serve as indicators of economic performance and future economic trends (Wagner 2020).
Global market sectors exhibited varying dynamics amid the pandemic.Gupta et al. (2022) reported significant adversity inflicted by COVID-19 on India's manufacturing, agriculture, and service sectors.In contrast, He et al. (2020) observed a negative impact on China's transportation, mining, and utilities sectors, while noting resilience in manufacturing, IT, education, and healthcare. Piñeiro-Chousa et al. (2022) assessed the stock market responses of two pioneering US biopharmaceutical firms in mRNA vaccine development, highlighting the distinct volatility influences on Pfizer and Moderna's returns before and during the pandemic.Zou and Wang (2023) revealed an undervaluation of the medical sector, pinpointing China's medical sector as being significantly valuable for its stable risk premium.In the realm of pharmaceuticals, Ho et al. (2022) employed the Fama-French five-factor model to determine the influence of medical reform announcements and COVID-19 vaccine approvals on Chinese pharmaceutical and healthcare company returns.Their findings indicated negative effects on younger and smaller firms due to medical reforms, whereas vaccine approvals generally boosted stock returns, except for smaller entities.Robke et al. (2020) reviewed the impact of the pandemic on pharmaceutical innovation investment and speculated on future changes in the innovation-sourcing landscape.Ayati et al. (2020) delved into the pandemic's short-and long-term effects on the pharmaceutical sector, ranging from immediate demand shifts and regulatory changes to longer-term industry growth deceleration and supply chain shifts toward self-sufficiency.Despite the focus of existing literature on China's biopharmaceutical sector, there is a gap in second-level industry classification studies.This study aims to address this gap by examining the transformations within the biopharmaceutical sector and its six subdivisions in China throughout the COVID-19 timeline, culminating in tailored investment recommendations based on an innovative factor-score clustering method.
Factor analysis is a statistical tool used to identify a relatively small number of factors that represent the relationships between many interrelated variables.Early studies have used this tool for financial analyses (Morelli 1999;Jones 2006;Bai and Ng 2006;Ludvigson and Ng 2007).Wagenvoort et al. (2011) used factor analysis to reveal that the rates on large and small loans with long fixation periods converged weakly.Through factor analysis, they introduced a new measure to reassess whether retail bank market integration must be present, ongoing, or complete.Few studies have used factor analysis to score stocks; hence, little investment advice is provided.We provide portfolio suggestions by ranking stocks according to their factor scores.Clustering is an unsupervised technique that is used to generate groups of similar objects.K-means clustering has been used in several financial analyses.For instance.Nanda et al. (2010) compared several clustering methods, including K-means, self-organizing maps (SOM), and Fuzzy C-means, to perform stock classification in India.The results showed that K-means clustering can help to build the most compact clusters.Seong and Nam (2021) used K-means clustering and multiple kernel learning techniques to predict stock price movements.Expanding on these methodologies, our study employs K-means clustering to classify 165 Chinese listed companies into distinct categories, from which we devise corresponding investment strategies.Notably, this study merges factor and cluster analyses, which is a rare approach in financial studies, to enhance investment decision-making.
The impetus for our research emerges from the scant literature on second-level industry classifications.Few studies have employed both factor and cluster analyses in tandem.Our study unites these methodologies to reduce the variables and categorizes them based on their intrinsic traits.

Factor analysis
The factor analysis method (Kim et al. 1978) recombines the original multidimensional variable indices to identify common factors, namely, the main factors.This reflects the primary statistical information of the multidimensional index used to achieve the dimension reduction.Factor analysis can be used to analyze multiple statistical variables.After dimension reduction, the main factors were the primary information of the original variables, making the research process simple, effective, and objective.In this study, each stock had 12 financial indicators with obvious multicollinearity (see Fig. 1).To reduce them to fewer variables, we adopted factor analysis to extract the common factors.The main steps of the factor analysis are as follows: Main Steps: • Standardization of data and applicability test First we must deal with the indicators through standardization.Here, X ij is the ith financial indicator in year j, and X i represents the mean of ith indicator, and S j represents the standard error of ith indicator.Z ij represents the variable after standardization, which follows a standard normal distribution.• Factor extraction and naming Common factors were extracted in alignment with the methodology of Hao et al. (2019) using the cumulative contribution rate.To minimize data loss from common factors and enhance factor analysis utility, factors were chosen when their cumulative variance contribution rates exceeded 80%.Following the factor extraction predicated on the eigenvalues, we computed the variance contribution rate and the corresponding cumulative rates.

• Factor scores and composite scores calculation
The score for each factor was based on the factor coefficient and standardized variables.The common factors were calculated as follows: where ) is the score of factor F i on variable X p .
The overall scores can be computed by multiplying the score of each main factor by the contribution rate as follow: where ) is the score of each factor, and α i (i = 1,2…m) is the contribution rate of each factor.Factor analysis can efficiently deal with several intercorrelated variables and identify common factors that contain most of the information in the data.

Cluster analysis (K-means)
Clustering is a data-mining technique that divides a dataset into multiple categories by calculating the similarity between the data.K-means clustering (Hartigan and Wong 1979) is a technique in which data are divided into preset K categories, making the data characteristics in the same category more similar.The cluster centers were iteratively updated to optimize the results.We adopted the K-means clustering to classify 165 stocks into several categories, and the steps are as follows (Seong and Nam 2021).Main steps: • Step 1: Select K = 7 initial centers for classification.(K = 7 is the value determined from the empirical results.)• Step 2: Calculate the distances between each point and the centers.
• Step 3: Classify the points according to distance using the Euclidean distance metric.• Step 4: Update the centers by calculating the centroid of different categories.
• Step 5: Repeat Steps 2, 3, and 4 until the data points around each center remain constant.
In addition to its fast convergence speed and excellent clustering performance, K-means clustering requires one-parameter tuning (K) and is more explicable than other clustering techniques.Our methodology steps are plotted in the flowchart shown in Fig. 1.

Data description
We employed factor and cluster analyses in our empirical study.We chose companies in the biopharmaceutical sector according to the latest 2021 Shenwan Industry Classification.Six categories were selected, based on the second-level industry classification of the Shenwan biopharmaceutical sector.We selected annual data from the audited financial statements of six categories of listed companies in the biopharmaceutical industry from 2018 to 2021.All data were obtained from the Wind Database.We eliminated companies with incomplete data, leaving 165 listed companies for the empirical analysis.Information on the 165 listed companies is presented in Table 1.Data preprocessing included truncation and cleansing to derive 12 evaluative indicators selected for their comprehensiveness and relevance.These indicators were chosen to maintain the objectivity, accuracy, and referential integrity of the assessments, ensuring that they encapsulated the development trajectory of the companies' stocks.Following Wang and Lee (2008), who advocated clustering based on financial ratio variability, the chosen indicators and their summary statistics are detailed in Tables 1 and 2.

Table 1 Information of companies and financial indicators
This table reports information of 165 listed companies including their average registered capital and average number of employees.In addition, this table gives a description of twelve financial indicators and their summary statistics Indicators Description

X1
The ratio of net cash flow from operating activities minus preferred stock dividends to the number of common shares outstanding

X2
The reflection of the value of the company's net assets as represented by each share of stock, a criterion for judging the company's quality

X3
The ratio of current year's increase in operating profit to last year's total operating profit reflects the increase or decrease in operating profit of an enterprise

X4
The number of total assets to total shareholders' equity, reflecting an enterprise's financial leverage

X5
The working capital ratio also reflects the enterprise's ability to realize assets X6 A measure of an enterprise's current assets that can be immediately used to repay current liabilities

X7
The total liabilities ratio to total owners' equity reflects how a business can borrow to operate

X8
The growth rate of total assets reflecting the liquidity of assets

X9
The ratio of the growth of the total assets of an enterprise in the current year to the total assets at the beginning of the year reflects the overall situation of the enterprise's assets in the current period

X10
The reflection of the growth rate of equity capital

X11
The profit after tax ratio to total equity reflects the company's profitability

X12
Recalculating earnings per share based on basic earnings per share, after converting potential ordinary shares into ordinary shares, increasing the total number of common shares

Factor analysis
The data were first analyzed using SPSS software to measure the company's financial performance at different levels.The correlation heat maps of the economic indicators for 2018 to 2021 are in Fig. 2. The tables of correlation coefficients of the financial indicators for 2018, 2019, 2020, and 2021 are reported in Annexed Tables 23, 24, 25, 26 in the Appendix.

Common factor formation
According to the requirements of the factor analysis method for the correlation degree of variables, we adopted Bartlett's spherical test and the Kaiser-Meyer-Olkin (KMO) test to verify the suitability of the factor analysis.Bartlett's spherical test was used to compare the correlation matrix of the data with the identity matrix.A significant Bartlett's test (p-value below 0.05) indicated the suitability of the data for factor analysis.The KMO test was used to compare the simple and partial correlation coefficients.If the KMO value approaches 1, the correlation among the variables is significant, and factor analysis is suitable for the variables.The results obtained after processing using SPSS are presented in Table 3.We can find that in each year, there is a high positive correlation between X4 (equity multiplier) and X7 (equity ratio), X5 (current ratio) and X6 (quick ratio), and there is an obvious negative correlation between X7 (equity ratio) and X5 (current ratio), X7 (equity ratio) and X6 (quick ratio) From the test results above, all KMO values were larger than 0.6, which means that there were more similar factors among the variables; therefore, the selected variables were suitable for the molecular factor test.The p-values of Bartlett's spherical test were all 0.00, below the threshold of 0.05, signifying robust interrelations among the financial indicators and endorsing their suitability for factor analysis.
After extracting common factors from the 12 financial indicators, the total variance explained degree, gravel plot, common factor variance results, and component score coefficient matrix were obtained by analyzing the relationships among the financial indicators and revealing the primary information contained in the common factors.
As the loadings of the common factors F 1 , F 2 , F 3 , and F 4 on some of the initial variables did not differ significantly, an explanatory relationship between the variables and the common factors could not be observed; therefore, rotation of the component matrix was required.
From the results in Table 4, four common factors were extracted from the 12 selected indicators.The cumulative variance contribution of F 1 , F 2 , F 3 and F 4 were 73.172%, 74.857%, 72.463%, and 77.789% respectively for 2018-2021.Analysis of the gravel plot in Fig. 2 also shows that the inflection point occured at the fourth root of the feature; therefore, the first four factors could be retained.Since we aimed to examine the attribution of each variable, the original component matrix was rotated for ease of naming.This rotation ensures that each variable has a more extensive loading on one common factor and a smaller loading on the remaining common factors, and the results are shown in Table 5.
The rotated factor-loading matrix is presented in Table 5.In the first common factor, diluted earnings per share, basic earnings per share, and net assets per share have more significant loadings, indicating that these three indicators are more correlated with each other and reveal the enterprise's current profits, losses, and future earnings expectations.Common factor F 1 is the operating income factor.The second common factor, net cash flow per share, operating profit growth, total assets growth rate, and growth rate of shareholders' equity, have extensive loadings, indicating that these four indicators are highly correlated and reveal the company's development capability in 2021.Common factor F 2 is the development potential factor.In the third common factor, equity multiplier and equity ratio have high loadings, indicating that these two indicators are strongly correlated and reveal a company's assets and equity.Common factor F 3 is the asset structure factor.Finally, the fourth common factor, with substantial loadings on the current ratio, quick ratio, and proportion of current assets to total assets, indicates a robust correlation among indicators that reflect a firm's short-term liquidity and is aptly labeled the "Solvency Factor."

Composite score of the company
A matrix of component score coefficients was obtained using SPSS (Table 6).
The standardized values of the initial indicators were substituted into the factor score function to calculate the factor scores of each sample, and a further comprehensive evaluation of the observed indicators was performed.A complete evaluation model was established using the variance contribution of the four common factors extracted as weights, combined with each factor score.

Table 5 Relationship between indicators and common factor
The relationships between indicators and common factors from 2018 to 2021 are reported in The composite score for each firm was calculated, and the scores were ranked separately in descending order, with only the top 20 stocks with positive composite scores shown in this paper, as shown in Table 7.
The literature review, including studies by He et al. (2020) and Gupta et al. (2022), indicated that industry firms broadly confronted the adverse consequences of the COVID-19 pandemic, leading to economic downturns.From Tables 7 and 8, the composite scores of the top three leading enterprises uniformly show upward trends, with the top four enterprises' composite scores being 2.04, 1.84, 1.8, and 1.44, respectively, in 2018 before the outbreak, and 2.45, 1.85, 1.73, and 1.64 in 2021 after the outbreak.The positive impact of the pandemic was offset by the adverse effects of the economic    environment, resulting in a slight increase in the overall scores of the leading companies.However, owing to the weaker capability of small medium enterprises to withstand economic downturns, even though the pandemic boosted their growth to a certain extent, the overall score still shows a slight decline.This result was consistent with the findings of Thukral (2021).
In 2021, the composite scores of the top seven companies rose by approximately 10% compared to 2020, reflecting sustained governmental support for biotechnological innovation and bio-industry development, coupled with increased consumer demand for medical supplies like alcohol and masks.This support has propelled leading biopharmaceutical stocks upward during the pandemic.
The classification and indicator score ranking of stocks in 2020 and 2021 have undergone a significant reshuffle compared to 2018 and 2019.Therefore, the specific development of the different types of pharmaceutical stocks must be discussed further.Table 9 presents the scores of the top 20 stocks in 2021 over the last 4 years.
The results in this table will be used to calculate the stock score in Tables 7 and 8.

Table 9 Stock score comparison
The top 20 scoring stocks and their categories in 2021 are reported in The top 20 stock categories in the composite score for 2021 are more evenly dispersed, with biopharmaceutical companies in all six categories.Of these, 80% of the companies in the four categories-medical devices, medical services, biological products, and chemical pharmaceuticals-showed an upward trend in their composite scores after the outbreak.As the response to the pandemic necessitated medical personnel and equipment support, the medical market experienced a significant boost in capital inflows.Additionally, advancements in vaccine research and development prompted a re-evaluation of the biopharmaceutical industry's significant growth potential.Several stocks exhibited multiple upward movements.For companies in the two categories of Chinese medicine and pharmaceutical commerce, the composite score tended to decline slightly or remained stable after the outbreak.The increased popularity of disinfection products post-pandemic led to a decline in the incidence of other infectious diseases such as influenza, resulting in a reverse impact on commercial businesses and Chinese medicine.

Cluster analysis
This study used the K-means clustering method to cluster the stocks in the sample and refine the similarities among them, analyzing the characteristics and commonalities of the 165 listed stocks.From the gravel plot (Fig. 3), it can be seen that the curve remains flat at K = 7, while it suddenly increases at K = 8; therefore, K is taken as 7, that is, the sample is divided into seven categories.We also show the results of the clustering data when the number of clusters is six and seven in Annexed Tables 27 and 28 in the Appendix.Based on the scores derived from the factor analysis, the stocks in each of the seven categories in 2021 are ranked in descending order of total scores.The top ten scoring stocks were screened and analyzed for their four public factor scores, as shown in Tables 10, 11, 12, 13, 14, 15, 16. Baker and Haslem (1974) divided investors into two types according to a decision-orientation criterion, which refers to investors' confidence in their decision-making abilities.This study divided investors into confidence groups based on these criteria.
In the first category, the stocks score exceptionally well on the operating income factor, while the other types perform at average levels.Operating earnings represent the existing earnings of a business, suggesting that this category is suitable for     investors aiming for short-term profits.However, due to their moderate growth potential, long-term investments are less advisable.This category is recommended for short-term holdings, particularly for investors with a lower risk tolerance.
In the second category, the stocks are more prominent in the solvency factor score.The scores of the other three factors are relatively stable, and the stock types are concentrated in biopharmaceuticals.The outbreak of the pandemic has led to a rapid and steady growth in new product R&D expenditures, and the scale of new product production and sales of large-and medium-sized enterprises in China's biopharmaceutical industry is on a faster growth trend.Therefore, this category represents an ideal option for investors seeking medium-to long-term holdings, particularly for those with lower risk tolerance.
In the third category, the stocks have stable scores in operating income, solvency, and development potential, reaching a positive value of approximately 60%.However, the asset structure factor scores are all negative.This finding indicates that the pandemic did not significantly affect the development of these A-share listed companies.There is a commonality of weak stability in funding sources, making it challenging to achieve high short-term returns on stocks.It is suitable for cautious investors to hold in the medium-or long-term.In the fourth category, the stocks have positive scores for operating income and development potential factors and negative scores for asset structure and solvency factors.Operating income represents the existing earnings of a business, suggesting that this category is suitable for confident investors seeking short-term profitability.It is recommended that investors hold forward contracts for this category of stocks, combined with the excellent performance of their growth potential.
In the fifth category, this group of stocks score particularly well on the development potential factor, whereas the other factors are relatively flat.Owing to the long duration of the impact of the coronavirus pandemic on China's biopharmaceutical industry, the potential of this category is stable in the long term.This is suitable for long-term holdings of cautious investors.
In the sixth category, the stocks do not score well on the operating income, solvency, asset structure, and development potential factors; all are relatively "medium." The distribution of stocks in this category is relatively even, and the risk profile is medium.Although there are some variations in share prices, stocks appear to be suitable for cautious investors.
In the seventh category, stocks stand out, with positive scores on the asset structure factor, whereas the other three factors have relatively flat scores.The leading asset structure indicator indicates the relative relationship between the funding sources provided by creditors and those offered by investors, reflecting the stability of the underlying financial structure of the business.Hence, stocks in this category are suitable for confident investors with a short-term horizon.

Sector investment analysis
We have offered stock-selection guidance for various investor profiles.However, translating these empirical findings into tangible economic benefits requires specific investment portfolio recommendations.Moving forward, we aim to devise optimized strategies for both long-and short-term investments, integrating the insights from Section "3.3" with returnrisk analyses.In this study, we address the optimization challenge by employing Markowitz's model, using the Sharpe ratio (Sharpe 1998) as a performance metric.Markowitz's model is designed to assist rational investors in maximizing returns for a given level of risk, or minimizing risk for a specified return level.

Long and short term portfolio
Suppose that investment P has a set of N variable assets in the market.Let r p be the expected rates of return and δ p be the risks.We will have expected return r p = N i=1 w i r i , where w is the weight factor with values between 0 and 1, and the variance δ 2 p (w) = N i=1 N j=1 w i w j δ ij = w T �w , where w i and w j are the weights assigned to stock i and j respectively; ∆ is the covariance matrix of the stocks, and δ ij is the covariance between the stock price of i and j.Then we define: where r f is the risk-free interest rate.We choose benchmark 1-year and 5-year deposit rates as proxies.The optimization problem can be described as follows: We consider a mean-variance investor and define the quadratic utility function for investing in this portfolio (Wang et al. 2016) as: where U is the mean-variance utility, E r p is the portfolio's mean return, and var r p is the portfolio's variance, which is a proxy for the portfolio risk.γ is the risk aversion coefficient, and we set γ to be 3 for the short-term investment and 6 for the long-term investment.
For long-term investment, we selected stocks from Categories 3 and 5, which include five stocks.Due to extensive missing data, stock 832,735, BJ (stocks from Beijing Exchange) was excluded from the portfolio analysis.We ran the optimization on the remaining four stocks: 300,573.SZ, 000538.SZ, 300,760.SZ, and 603,127.SH.We set r f as the benchmark five-year deposit rate for 2021 (2.75%).Optimization simulations used stock prices from the start of 2021 to the end of 2021.R programming was used to perform optimization according to the above formula.Figure 4a shows the Sharpe ratio curve of this portfolio against the expected return with a minimum α.For demonstration, we display the nearest ten results around the maximum point.As shown in Table 17, 0.03452 had the largest Sharpe ratio, and the weights of 000523.SZ and 300,760.The SZ in this portfolio is 0 with a weight of 300,573.SZ and 603,127.SH were 24.8% and 75.2%, respectively, with an expected return of 7.179% and a standard deviation of 1.283.Figure 4a shows the Sharpe ratio curve against the expected return with a minimum α.Table 17 also reports the mean-variance utility values for each portfolio.An optimal value of 4.7134 was achieved when the weights of 300,573.SZ and 603,127.SH were 26.99% and 73.01%, respectively, with an expected return of 7.148% and standard deviation of 1.274.For short-term investment, we selected the best stock from Categories 1, 4, and 7, which comprise 12 stocks in total.In this situation, we set α to be 1.75%, the one-year risk-free interest rate in 2021.Similarly, Fig. 4b shows the Sharpe ratio curve for the 12 stocks against the expected returns.Similar to Table 17, the optimization results with the nearest ten points are listed in Table 18.This table excludes the weights of 688,399.SH,601,607.SH,000661.SZ,603,368.SH,600,829.SH, 002524.SZ, 000028.SZ, 000411.SZ, and 002462.SZ because their weights are approximately 0.00% when a minimum of 0.0175 expected return is required.The remaining three stocks are 002821.SZ,600,713.SH,and 600,129.SH. From Table 18,0.05123 is the largest Sharpe ratio, and for this investment, the weights of 600,173.SH and 601,607.SH were both 0 and the weights of 002821.SZ and 600,129.SH were 23.88% and 76.12%, respectively, with an expected return of  8.133% and a standard deviation of 1.051.Table 18 also reports the mean-variance utility values for each portfolio.The optimal value of 4.8425 was achieved when the weights of 002821.SZ and 600,129.SH were 28.57% and 71.43%, respectively, with an expected return of 8.019% and standard deviation of 1.029.
To provide investment suggestions from another risk-return perspective, we calculated the portfolios (which have returns above the risk-free rate), Treynor Ratio (TR), and Jensen Alpha (JA); the optimal investment strategy using these two performance evaluation metrics is shown in Tables 19 and 20.The Treynor Ratio measures the risk premium earned per unit of systematic risk, whereas Jensen Alpha assesses investment performance by quantifying the deviation of a portfolio's average return from its expected return, based on the Capital Asset Pricing Model.The formulae for these performance metrics are as follows:  where β i is the beta of the holding and r m is the average expected return of the market.
From Table 19, the short-term optimal portfolios determined by the Treynor Ratio and Jensen Alpha differ considerably from those based on the Sharpe Ratio and mean-variance utility from Table 18.In Table 20, the long-term optimal portfolios determined by the Treynor Ratio and Jensen Alpha are consistent with the results in Table 17.In summary, the optimal long-term portfolio decisions do not vary significantly if different metrics are applied, whereas the optimal short-term portfolio can be completely diverse based on various portfolio profitability evaluation methods.

Robustness check
In addition to constructing long-and short-term portfolios, we conducted a robustness check by comparing our results in Section "Long and short term portfolio" and with traditional equally weighted portfolios.Specifically, for the short-term portfolio, we partitioned 2021 into four quarters and compared them for each quarter.Tables 21 and 22 display the robustness check results for the long-and short-term portfolios, respectively.From the tables, our results from Section "Long and short term portfolio" are clearly better than those from the equally weighted portfolio (EWP).Although EWP had a higher Sharpe ratio, our results lead to higher portfolio returns and utility.

Conclusion
This study employed factor and cluster analyses to examine 165 listed companies from 2018 to 2021.Based on these results, we assessed the overall performance of various biopharmaceutical sector stocks, estimated the impact of COVID-19 on different stock types, and offer recommendations to investors.The main results of this study are as follows.First, the biopharmaceutical industry stocks maintained an upward momentum during the pandemic.Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals increased, whereas Chinese medicine and pharmaceutical commerce declined.Unlike previous studies, we explored the dynamics of second-level classification stocks during the COVID-19 pandemic.Second, regarding the clustering results, we conclude: Category 1 stocks are suitable for short-term holdings by unconfident investors.Categories 2 and 3 stocks are medium to long-term holdings for cautious investors.The fourth category is ideal for confident investors investing in forward contracts.Category 5 is suitable for long-term holdings by cautious investors.Category 6 is suitable for cautious stockholders.Category 7 is ideal for short-term holdings by confident investors.We considered both the degree of risk aversion and holding period of stock investors.Finally, we developed an optimal investment strategy using the Sharpe ratio and mean-variance utility, and provided alternative performance metrics, including the Treynor Ratio and Jensen Alpha, for comparison.For long-term investments, we recommend that investors allocate 24.8% of their wealth in 300,573.SZ and 75.2% in 603,127.SH to achieve the highest Sharpe ratio.We suggest a portfolio allocation of 26.99% in 300,573.SZ and 73.01% in 603,127.SH for the best mean-variance utility.For short-term investments, we recommend a portfolio allocation of 23.88% in 600,173.SH and 76.12% in 601,607.SH for the highest Sharpe ratio, and 28.57% in 600,173.SH and 71.43% in 601,607.SH for the best mean-variance utility.We provide investment suggestions tailored to investors with different preferences for portfolio analysis metrics.
This study had some limitations.For example, this study focused only on the stock market during the pandemic.Other financial assets such as bonds and futures were not included.Second, the analysis was restricted to data from China, despite the global impact of COVID-19, suggesting the potential relevance of examining financial markets in other countries.
Future research could broaden the scope to include additional financial assets as the pandemic progresses in China.In addition, given the significant effects of the pandemic on countries such as the US and India, a comparative study of their biopharmaceutical sectors relative to China's may yield valuable insights.Furthermore, advances in research methodologies may allow the application of sophisticated machine learning and deep learning techniques, such as DBN (Deep Belief Network), LSTM (Long Short-Term Memory), and support vector machines, to explore nonlinear relationships in the Chinese stock market.While some studies, such as Leippold et al. (2022), have applied nonlinear machine learning methods to the Chinese stock market, few have specifically addressed nonlinear dynamics within sectors such as the biopharmaceutical industry.

Table 23 Correlation coefficient matrix of financial indicators in 2018
The table shows the correlation coefficients matrix for the 12 financial indicators in 2018.The correlation coefficients above 0.7 are highlighted

Fig. 1
Fig.1Flowchart of research methodology.Note: This figure shows the methodology steps of our study.Specifically, we do factor analysis and K-means clustering on financial indicators.After that, we combine the results to select stocks and construct long term and short term portfolios for investors.Detailed description are in Sections "Factor Analysis" and "Cluster analysis(K-means)" Fig. 2 Correlation heat map plots of financial indicators.Note The correlation heatmaps of 12 financial indicators from year 2018-2021 are plotted.A red box stands for a high positive correlation and a purple box stands for a low negative correlation.We can find that in each year, there is a high positive correlation between X4 (equity multiplier) and X7 (equity ratio), X5 (current ratio) and X6 (quick ratio), and there is an obvious negative correlation between X7 (equity ratio) and X5 (current ratio), X7 (equity ratio) and X6 (quick ratio)

Fig. 3
Fig. 3 Gravel Plot.Note: Gravel plot of common factors is shown in this figure.The 4th point turns out to be an inflection point and the exact variance contribution rate is shown in Table 4

Fig. 4
Fig. 4 Sharpe Ratio curves of long-and short-term investment

Table 2
Summary statistics of financial indicatorsThis table reports the summary statistics of 12 financial indicators

Table 3
KMO and Bartlett test   KMO andBarlett tests results are used to check if the factor analysis can be adopted in the dataset.With KMO metric larger than 0.6 and Barlett test p-value less than 0.05, factor analysis can be adopted

Table 4
Total variance explained The variance contribution rates and cumulative contribution rates of four common factors from 2018 to 2021 are reported in Table3.The final cumulative contribution rates for four years are respectively 73.172%, 74.857%, 72.463% and 77.789%

Table 4 .
The most related indicators for each common factor are listed.A larger absolute value in the table means a closer relationship between two certain factors

Table 6
Component score coefficient matrix

Table
Top 20 scoring stocks and scores for 2018 and 2019 The top scoring stocks and scores in 2018 and 2019 are reported.Ranks, stock codes and scores are provided.000028.SZ and 000661.SZ are respectively two best stocks in 2018 and 2019 with scores: 2.04 and 2.74.The top 3 stocks both in 2018 and 2019 are all from Shenzhen Stock Exchange

Table
Top 20 scoring stocks and scores for 2020 and 2021 The top scoring stocks and scores in 2020 and 2021 are reported.Ranks, stock codes and scores are provided.688,399.SH and 688,399.SH are respectively two best stocks in 2020 and 2021 with scores: 2.26 and 2.45.Besides, 300,347.SZ also has a good score at 2.23 in 2020

Table 8
. Moreover, their scores in 2018, 2019 and 2020 are also provided for comparison.The top one stock, 688,399.SH's score increased year by year starting from 0.44 in 2018.Among the top 20 stocks, 8 stocks come from pharmaceutical commerce and 5 stocks come from Medical services

Table 10
Ranking and factor scores for Category 1 stocks

Table 11
Ranking and factor scores for Category 2 stocks

Table 12
Ranking and factor scores for Category 3 stocks

Table 13
Ranking and factor scores for Category 4 stocks

Table 14
Ranking and factor scores for Category 5 stocks

Table 15
Ranking and factor scores for Category 6 stocks

Table 16
Ranking and factor scores for Category 7 stocks

Table 17
Long-term investment of stocksThis table reports the nearest 10 points around the maximum point of Sharpe Ratio and mean-variance utility in the situation of long term portfolio strategy.In the first line, ER and SD respectively stands for expected return and standard deviation of an investment, and the values in the column of stocks represent the weight of this stock in an investment.The largest Sharpe ratio and utility value are highlighted in bold

Table 18
Short-term investment of stocksThis table reports the nearest 10 points around the maximum point of Sharpe Ratio and mean-variance utility in the situation of short-term investment strategy.In the first line, ER and SD respectively stands for expected return and standard deviation of an investment, and the values in the column of stocks represent the weight of this stock in an investment.The largest Sharpe ratio (which is 0.05123 in this table) is highlighted in bold

Table 19
Short-term investment with optimal TR and JA This table reports the two short term investment respectively with the optimal performance in Treynor Ratio and Jensen Alpha

Table 20
Long-term investment with optimal TR and JA This table reports the two long term investment respectively with the optimal performance in Treynor Ratio and Jensen Alpha

Table 21
Robustness check results for the long-term portfolio

Table 22
Robustness check results for the short-term portfolio

Table
Correlation coefficient matrix of financial indicators in 2019The table shows the correlation coefficients matrix for the 12 financial indicators in 2019.The correlation coefficients above 0.7 are highlighted

Table
Correlation coefficient matrix of financial indicators in 2020The table shows the correlation coefficients matrix for the 12 financial indicators in 2020.The correlation coefficients above 0.7 are highlighted

Table
Correlation Coefficient Matrix of Financial Indicators in 2021The table shows the correlation coefficients matrix for the 12 financial indicators in 2021.The correlation coefficients above 0.7 are highlighted