Skip to main content

A factor score clustering approach to analyze the biopharmaceutical sector in the Chinese market during COVID-19

Abstract

The biopharmaceutical sector is of considerable interest during the COVID-19 pandemic. This study aims to investigate the biopharmaceutical sector using the Shenwan Industry Classification and provides insights into investment strategies. We combine factor and cluster analyses to reduce data dimensions and detect their latent similarities. Specifically, the biopharmaceutical sector is divided into six categories based on second-level industry classification. It is observed that medical devices, medical services, biological products, and chemical pharmaceuticals maintained their upward tendency, while Chinese medicine and pharmaceutical commerce declined slightly. We also develop optimal investment strategies using various metrics for different investor types.

Introduction

Since its initial announcement in early 2020, the coronavirus disease 2019 (COVID-19) outbreak has precipitated recessions across major global economies such as China, the European Union, the United Kingdom, and the United States. This period also saw some countries experience significant political turmoil, contributing to considerable instability in financial markets. Notably, the global stock market has exhibited several phenomena indicative of these disruptions. AlAli (2020) identified the negative impact of COVID-19 on abnormal market returns, highlighting marked differences in returns before and after the World Health Organization's declaration in major Asian stock markets, as determined through event study methodologies. Additionally, using a novel time series approach based on a Bayesian structure, Takyi and Bentum-Ennin (2021) estimated that the stock performance of 13 countries (Ghana, Nigeria, South Africa, Kenya, Tanzania, Tunisia, Mauritius, Morocco, Zambia, Namibia, Botwana, Cote D’Ivoire and Uganda) was rarely positively affected by COVID-19. Specifically, since January 2020, the Chinese stock market has experienced significant downturns, with the Shanghai stock index dropping by 8.7%, reaching a low of 2646 in March.

Esparcia and López (2022) reported that COVID-19 provided the biopharmaceutical industry with unprecedented revenue sources. The sector has been extensively analyzed in the literature (Robke et al. 2020; Ayati et al. 2020; Esparcia and López 2022; Ho et al. 2022). According to the Shenwan Industry Classification, China's biopharmaceutical sector includes medical services, medical devices, biological products, pharmaceutical commerce, Chinese medicine, and chemical pharmaceuticals. This raises critical questions regarding the uniformity of growth across these categories and their investment potential. This article focuses on two main inquiries: First, how have the six categories within China’s biopharmaceutical sector changed during the pandemic? We explore this using factor scores before and after the onset of pandemic. Second, which investment strategies are optimal for various investor profiles within this sector? We apply factor and cluster analyses to guide stock selection and offer investment portfolio recommendations based on standard investment metrics.

As financial markets have rapidly evolved in recent decades, multivariate analysis has become integral to managing large, high-dimensional datasets in financial research (Wagenvoort et al. 2011; Vats and Samdani 2019; Seong and Nam 2021). This study combines factor analysis and clustering techniques for stock market analysis. As noted by Gorman and Primavera (1983), the primary purpose of factor analysis is to reduce the number of variables by grouping them into factors based on their correlations. Unlike factor analysis, cluster analysis can homogeneously group variables based on one or more multivariate similarity criteria. The goal is to segment data into clusters that reflect similarities and differences. The strength of factor analysis is the detection of a set of common and underlying dimensions of variables. However, factor analysis is unsuitable for investigating the latent similarities of data, which motivates us to consider using cluster analysis because the strength of cluster analysis is in identifying the different profiles or categories of the data and respondents. Based on the characteristic of the different techniques, we first consider using factor analysis to reduce the number of financial indicators to several components. Subsequently, we apply cluster analysis to homogeneously group the data according to the components.

To examine the impact of COVID-19 on the biopharmaceutical sector, this study analyzes 165 listed A-shared companies in China according to Shenwan Industry Classification. These findings highlight several key trends. First, biopharmaceutical industry stocks maintained an upward momentum during the pandemic. Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals have increased, whereas Chinese medicine and pharmaceutical commerce have declined slightly. Second, based on the results of the cluster analysis, we allocate stock categories to different types of investors. Specifically, we classify investors based on their long- and short-term investments and degree of risk aversion. Finally, we translate the empirical results into economic gains, develop an optimal investment strategy based on the Sharpe ratio and mean–variance utility, and offer additional performance assessments using Treynor’s ratio and Jensen’s alpha. For short-term investments, the results vary because the short-term volatility of stocks is much higher, whereas for long-term investments, the results are similar.

This study contributes to the literature in three ways. First, it integrates factor and cluster analyses within the context of the Chinese financial market,—a combination seldom used in financial research. We illustrate the effectiveness of these combined techniques in analyzing the Chinese biopharmaceutical sector. Second, we contribute to second-level industry classification research. Most previous related literature focuses on first-level industry classification; in this article, we also discuss second-level industries (six categories) in the biopharmaceutical sector. In the context of the pandemic, investors are confident in the biopharmaceutical industries but not all categories have been boosted by COVID-19. Therefore, research on second-level industry classifications is useful for practical investments. Finally, we extend studies on the effects of COVID-19 on financial markets.

The remainder of this paper is organized as follows. In the next section, we review related literature and background. Section “Methodology” presents the methodologies used, including factor analysis and K-means clustering. Section “Empirical study” introduces the data and empirical research. We then provide an investment analysis in Section “Sector investment analysis” based on the categories derived in the previous section. Finally, we conclude the paper in Section “Conclusion”.

Literature review

Recent studies have advanced our understanding of stock market dynamics during the COVID-19 pandemic. Cox et al. (2020) found that the Federal Reserve played a role in stock market fluctuations in the early weeks of the pandemic. Daglis et al. (2022) presented another interesting conclusion: a decreasing impact of COVID-19 on the Italian stock market index and increasing volatility, both of which are statistically significant. Cevik et al. (2022) employed multiple analytical methods, including a panel regression with fixed effects, panel quantile regressions, a panel vector autoregression model, and country-specific regressions, to explore the relationship between investor sentiments and stock market returns and volatility across 20 countries. They observed that rising positive investor sentiment boosts stock returns, whereas negative sentiment dampens returns at the lower quantiles. Esparcia and López (2022) developed a global and dynamic ratio to summarize different investor profiles according to their attitudes toward risk and to consider the dynamic nature of the economy and financial markets. They find that Principal Component Analysis enables the first principal component to summarize the information contained in the initial performance rankings. In the context of China, Liu et al. (2020) identified a downturn in Chinese stock markets since the onset of the pandemic, suggesting that stock price movements can serve as indicators of economic performance and future economic trends (Wagner 2020).

Global market sectors exhibited varying dynamics amid the pandemic. Gupta et al. (2022) reported significant adversity inflicted by COVID-19 on India's manufacturing, agriculture, and service sectors. In contrast, He et al. (2020) observed a negative impact on China's transportation, mining, and utilities sectors, while noting resilience in manufacturing, IT, education, and healthcare. Piñeiro-Chousa et al. (2022) assessed the stock market responses of two pioneering US biopharmaceutical firms in mRNA vaccine development, highlighting the distinct volatility influences on Pfizer and Moderna’s returns before and during the pandemic. Zou and Wang (2023) revealed an undervaluation of the medical sector, pinpointing China's medical sector as being significantly valuable for its stable risk premium. In the realm of pharmaceuticals, Ho et al. (2022) employed the Fama–French five-factor model to determine the influence of medical reform announcements and COVID-19 vaccine approvals on Chinese pharmaceutical and healthcare company returns. Their findings indicated negative effects on younger and smaller firms due to medical reforms, whereas vaccine approvals generally boosted stock returns, except for smaller entities. Robke et al. (2020) reviewed the impact of the pandemic on pharmaceutical innovation investment and speculated on future changes in the innovation-sourcing landscape. Ayati et al. (2020) delved into the pandemic’s short- and long-term effects on the pharmaceutical sector, ranging from immediate demand shifts and regulatory changes to longer-term industry growth deceleration and supply chain shifts toward self-sufficiency. Despite the focus of existing literature on China’s biopharmaceutical sector, there is a gap in second-level industry classification studies. This study aims to address this gap by examining the transformations within the biopharmaceutical sector and its six subdivisions in China throughout the COVID-19 timeline, culminating in tailored investment recommendations based on an innovative factor-score clustering method.

Factor analysis is a statistical tool used to identify a relatively small number of factors that represent the relationships between many interrelated variables. Early studies have used this tool for financial analyses (Morelli 1999; Jones 2006; Bai and Ng 2006; Ludvigson and Ng 2007). Wagenvoort et al. (2011) used factor analysis to reveal that the rates on large and small loans with long fixation periods converged weakly. Through factor analysis, they introduced a new measure to reassess whether retail bank market integration must be present, ongoing, or complete. Few studies have used factor analysis to score stocks; hence, little investment advice is provided. We provide portfolio suggestions by ranking stocks according to their factor scores. Clustering is an unsupervised technique that is used to generate groups of similar objects. K-means clustering has been used in several financial analyses. For instance. Nanda et al. (2010) compared several clustering methods, including K-means, self-organizing maps (SOM), and Fuzzy C-means, to perform stock classification in India. The results showed that K-means clustering can help to build the most compact clusters. Seong and Nam (2021) used K-means clustering and multiple kernel learning techniques to predict stock price movements. Expanding on these methodologies, our study employs K-means clustering to classify 165 Chinese listed companies into distinct categories, from which we devise corresponding investment strategies. Notably, this study merges factor and cluster analyses, which is a rare approach in financial studies, to enhance investment decision-making.

The impetus for our research emerges from the scant literature on second-level industry classifications. Few studies have employed both factor and cluster analyses in tandem. Our study unites these methodologies to reduce the variables and categorizes them based on their intrinsic traits.

Methodology

Factor analysis

The factor analysis method (Kim et al. 1978) recombines the original multidimensional variable indices to identify common factors, namely, the main factors. This reflects the primary statistical information of the multidimensional index used to achieve the dimension reduction. Factor analysis can be used to analyze multiple statistical variables. After dimension reduction, the main factors were the primary information of the original variables, making the research process simple, effective, and objective. In this study, each stock had 12 financial indicators with obvious multicollinearity (see Fig. 1). To reduce them to fewer variables, we adopted factor analysis to extract the common factors. The main steps of the factor analysis are as follows:

Fig. 1
figure 1

Flowchart of research methodology. Note: This figure shows the methodology steps of our study. Specifically, we do factor analysis and K-means clustering on financial indicators. After that, we combine the results to select stocks and construct long term and short term portfolios for investors. Detailed description are in Sections “Factor Analysis” and “Cluster analysis(K-means)

Main Steps:

  • Standardization of data and applicability test

    First we must deal with the indicators through standardization. Here, Xij is the ith financial indicator in year j, and Xi represents the mean of ith indicator, and Sj represents the standard error of ith indicator. Zij represents the variable after standardization, which follows a standard normal distribution.

    $${\text{Z}}_{\text{ij}}=\frac{{\text{X}}_{\text{ij}}-{\text{X}}_{\text{i}}}{{\text{S}}_{\text{j}}} \left({Z}_{ij} \sim {\text{N}}\left(\text{0,1}\right)\right)$$
  • Factor extraction and naming

    Common factors were extracted in alignment with the methodology of Hao et al. (2019) using the cumulative contribution rate. To minimize data loss from common factors and enhance factor analysis utility, factors were chosen when their cumulative variance contribution rates exceeded 80%. Following the factor extraction predicated on the eigenvalues, we computed the variance contribution rate and the corresponding cumulative rates.

  • Factor scores and composite scores calculation

    The score for each factor was based on the factor coefficient and standardized variables. The common factors were calculated as follows:

    $$Fi = \beta i1X1 \, + \, \beta i2X2 \, + \, \beta i3X3 \, + \cdots + \, \beta inXn$$

    where βip (i = 1,2…n) is the score of factor Fi on variable Xp.

    The overall scores can be computed by multiplying the score of each main factor by the contribution rate as follow:

    $$F = \alpha_{1} F_{1} + \, \alpha_{2} F_{2} + \, \alpha_{3} F_{3} + \cdots + \, \alpha_{m} F_{m}$$

    where Fi (i = 1,2…m) is the score of each factor, and αi (i = 1,2…m) is the contribution rate of each factor. Factor analysis can efficiently deal with several intercorrelated variables and identify common factors that contain most of the information in the data.

Cluster analysis (K-means)

Clustering is a data-mining technique that divides a dataset into multiple categories by calculating the similarity between the data. K-means clustering (Hartigan and Wong 1979) is a technique in which data are divided into preset K categories, making the data characteristics in the same category more similar. The cluster centers were iteratively updated to optimize the results. We adopted the K-means clustering to classify 165 stocks into several categories, and the steps are as follows (Seong and Nam 2021).

Main steps:

  • Step 1: Select K = 7 initial centers for classification. (K = 7 is the value determined from the empirical results.)

  • Step 2: Calculate the distances between each point and the centers.

  • Step 3: Classify the points according to distance using the Euclidean distance metric.

  • Step 4: Update the centers by calculating the centroid of different categories.

  • Step 5: Repeat Steps 2, 3, and 4 until the data points around each center remain constant.

In addition to its fast convergence speed and excellent clustering performance, K-means clustering requires one-parameter tuning (K) and is more explicable than other clustering techniques. Our methodology steps are plotted in the flowchart shown in Fig. 1.

Empirical study

Data description

We employed factor and cluster analyses in our empirical study. We chose companies in the biopharmaceutical sector according to the latest 2021 Shenwan Industry Classification. Six categories were selected, based on the second-level industry classification of the Shenwan biopharmaceutical sector. We selected annual data from the audited financial statements of six categories of listed companies in the biopharmaceutical industry from 2018 to 2021. All data were obtained from the Wind Database. We eliminated companies with incomplete data, leaving 165 listed companies for the empirical analysis. Information on the 165 listed companies is presented in Table 1. Data preprocessing included truncation and cleansing to derive 12 evaluative indicators selected for their comprehensiveness and relevance. These indicators were chosen to maintain the objectivity, accuracy, and referential integrity of the assessments, ensuring that they encapsulated the development trajectory of the companies’ stocks. Following Wang and Lee (2008), who advocated clustering based on financial ratio variability, the chosen indicators and their summary statistics are detailed in Tables 1 and 2.

Table 1 Information of companies and financial indicators
Table 2 Summary statistics of financial indicators

Factor analysis

The data were first analyzed using SPSS software to measure the company’s financial performance at different levels. The correlation heat maps of the economic indicators for 2018 to 2021 are in Fig. 2. The tables of correlation coefficients of the financial indicators for 2018, 2019, 2020, and 2021 are reported in Annexed Tables 23, 24, 25, 26 in the Appendix.

Fig. 2
figure 2

Correlation heat map plots of financial indicators. Note The correlation heatmaps of 12 financial indicators from year 2018–2021 are plotted. A red box stands for a high positive correlation and a purple box stands for a low negative correlation. We can find that in each year, there is a high positive correlation between X4 (equity multiplier) and X7 (equity ratio), X5 (current ratio) and X6 (quick ratio), and there is an obvious negative correlation between X7 (equity ratio) and X5 (current ratio), X7 (equity ratio) and X6 (quick ratio)

Common factor formation

According to the requirements of the factor analysis method for the correlation degree of variables, we adopted Bartlett’s spherical test and the Kaiser–Meyer–Olkin (KMO) test to verify the suitability of the factor analysis. Bartlett’s spherical test was used to compare the correlation matrix of the data with the identity matrix. A significant Bartlett’s test (p-value below 0.05) indicated the suitability of the data for factor analysis. The KMO test was used to compare the simple and partial correlation coefficients. If the KMO value approaches 1, the correlation among the variables is significant, and factor analysis is suitable for the variables. The results obtained after processing using SPSS are presented in Table 3.

Table 3 KMO and Bartlett test

From the test results above, all KMO values were larger than 0.6, which means that there were more similar factors among the variables; therefore, the selected variables were suitable for the molecular factor test. The p-values of Bartlett’s spherical test were all 0.00, below the threshold of 0.05, signifying robust interrelations among the financial indicators and endorsing their suitability for factor analysis.

After extracting common factors from the 12 financial indicators, the total variance explained degree, gravel plot, common factor variance results, and component score coefficient matrix were obtained by analyzing the relationships among the financial indicators and revealing the primary information contained in the common factors.

As the loadings of the common factors F1, F2, F3, and F4 on some of the initial variables did not differ significantly, an explanatory relationship between the variables and the common factors could not be observed; therefore, rotation of the component matrix was required.

From the results in Table 4, four common factors were extracted from the 12 selected indicators. The cumulative variance contribution of F1, F2, F3 and F4 were 73.172%, 74.857%, 72.463%, and 77.789% respectively for 2018–2021. Analysis of the gravel plot in Fig. 2 also shows that the inflection point occured at the fourth root of the feature; therefore, the first four factors could be retained.

Table 4 Total variance explained

Since we aimed to examine the attribution of each variable, the original component matrix was rotated for ease of naming. This rotation ensures that each variable has a more extensive loading on one common factor and a smaller loading on the remaining common factors, and the results are shown in Table 5.

Table 5 Relationship between indicators and common factor

The rotated factor-loading matrix is presented in Table 5. In the first common factor, diluted earnings per share, basic earnings per share, and net assets per share have more significant loadings, indicating that these three indicators are more correlated with each other and reveal the enterprise’s current profits, losses, and future earnings expectations. Common factor F1 is the operating income factor. The second common factor, net cash flow per share, operating profit growth, total assets growth rate, and growth rate of shareholders’ equity, have extensive loadings, indicating that these four indicators are highly correlated and reveal the company’s development capability in 2021. Common factor F2 is the development potential factor. In the third common factor, equity multiplier and equity ratio have high loadings, indicating that these two indicators are strongly correlated and reveal a company’s assets and equity. Common factor F3 is the asset structure factor. Finally, the fourth common factor, with substantial loadings on the current ratio, quick ratio, and proportion of current assets to total assets, indicates a robust correlation among indicators that reflect a firm's short-term liquidity and is aptly labeled the “Solvency Factor.”

Composite score of the company

A matrix of component score coefficients was obtained using SPSS (Table 6).

Table 6 Component score coefficient matrix

The standardized values of the initial indicators were substituted into the factor score function to calculate the factor scores of each sample, and a further comprehensive evaluation of the observed indicators was performed. A complete evaluation model was established using the variance contribution of the four common factors extracted as weights, combined with each factor score.

$$\text{F}=\frac{23.522\%}{77.789\%}{F}_{1}+\frac{18.678\%}{77.789\%}{F}_{2}+\frac{18.283\%}{77.789\%}{F}_{3}+\frac{17.306\%}{77.789\%}{F}_{4}$$
$${\text{i}}.{\text{e}}.F = \, 0.{3}0{2}F_{{1}} + \, 0.{24}0F_{{2}} + \, 0.{235}F_{{3}} + \, 0.{222}F_{{4}}$$

The composite score for each firm was calculated, and the scores were ranked separately in descending order, with only the top 20 stocks with positive composite scores shown in this paper, as shown in Table 7.

Table 7 Top 20 scoring stocks and scores for 2018 and 2019

The literature review, including studies by He et al. (2020) and Gupta et al. (2022), indicated that industry firms broadly confronted the adverse consequences of the COVID-19 pandemic, leading to economic downturns. From Tables 7 and 8, the composite scores of the top three leading enterprises uniformly show upward trends, with the top four enterprises’ composite scores being 2.04, 1.84, 1.8, and 1.44, respectively, in 2018 before the outbreak, and 2.45, 1.85, 1.73, and 1.64 in 2021 after the outbreak. The positive impact of the pandemic was offset by the adverse effects of the economic environment, resulting in a slight increase in the overall scores of the leading companies. However, owing to the weaker capability of small medium enterprises to withstand economic downturns, even though the pandemic boosted their growth to a certain extent, the overall score still shows a slight decline. This result was consistent with the findings of Thukral (2021).

Table 8 Top 20 scoring stocks and scores for 2020 and 2021

In 2021, the composite scores of the top seven companies rose by approximately 10% compared to 2020, reflecting sustained governmental support for biotechnological innovation and bio-industry development, coupled with increased consumer demand for medical supplies like alcohol and masks. This support has propelled leading biopharmaceutical stocks upward during the pandemic.

The classification and indicator score ranking of stocks in 2020 and 2021 have undergone a significant reshuffle compared to 2018 and 2019. Therefore, the specific development of the different types of pharmaceutical stocks must be discussed further. Table 9 presents the scores of the top 20 stocks in 2021 over the last 4 years.

Table 9 Stock score comparison

The results in this table will be used to calculate the stock score in Tables 7 and 8.

The top 20 stock categories in the composite score for 2021 are more evenly dispersed, with biopharmaceutical companies in all six categories. Of these, 80% of the companies in the four categories—medical devices, medical services, biological products, and chemical pharmaceuticals—showed an upward trend in their composite scores after the outbreak. As the response to the pandemic necessitated medical personnel and equipment support, the medical market experienced a significant boost in capital inflows. Additionally, advancements in vaccine research and development prompted a re-evaluation of the biopharmaceutical industry's significant growth potential. Several stocks exhibited multiple upward movements. For companies in the two categories of Chinese medicine and pharmaceutical commerce, the composite score tended to decline slightly or remained stable after the outbreak. The increased popularity of disinfection products post-pandemic led to a decline in the incidence of other infectious diseases such as influenza, resulting in a reverse impact on commercial businesses and Chinese medicine.

Cluster analysis

This study used the K-means clustering method to cluster the stocks in the sample and refine the similarities among them, analyzing the characteristics and commonalities of the 165 listed stocks. From the gravel plot (Fig. 3), it can be seen that the curve remains flat at K = 7, while it suddenly increases at K = 8; therefore, K is taken as 7, that is, the sample is divided into seven categories. We also show the results of the clustering data when the number of clusters is six and seven in Annexed Tables 27 and 28 in the Appendix.

Fig. 3
figure 3

Gravel Plot. Note: Gravel plot of common factors is shown in this figure. The 4th point turns out to be an inflection point and the exact variance contribution rate is shown in Table 4

Based on the scores derived from the factor analysis, the stocks in each of the seven categories in 2021 are ranked in descending order of total scores. The top ten scoring stocks were screened and analyzed for their four public factor scores, as shown in Tables 10, 11, 12, 13, 14, 15, 16. Baker and Haslem (1974) divided investors into two types according to a decision-orientation criterion, which refers to investors’ confidence in their decision-making abilities. This study divided investors into confidence groups based on these criteria.

Table 10 Ranking and factor scores for Category 1 stocks
Table 11 Ranking and factor scores for Category 2 stocks
Table 12 Ranking and factor scores for Category 3 stocks
Table 13 Ranking and factor scores for Category 4 stocks
Table 14 Ranking and factor scores for Category 5 stocks
Table 15 Ranking and factor scores for Category 6 stocks
Table 16 Ranking and factor scores for Category 7 stocks

In the first category, the stocks score exceptionally well on the operating income factor, while the other types perform at average levels. Operating earnings represent the existing earnings of a business, suggesting that this category is suitable for investors aiming for short-term profits. However, due to their moderate growth potential, long-term investments are less advisable. This category is recommended for short-term holdings, particularly for investors with a lower risk tolerance.

In the second category, the stocks are more prominent in the solvency factor score. The scores of the other three factors are relatively stable, and the stock types are concentrated in biopharmaceuticals. The outbreak of the pandemic has led to a rapid and steady growth in new product R&D expenditures, and the scale of new product production and sales of large- and medium-sized enterprises in China’s biopharmaceutical industry is on a faster growth trend. Therefore, this category represents an ideal option for investors seeking medium- to long-term holdings, particularly for those with lower risk tolerance.

In the third category, the stocks have stable scores in operating income, solvency, and development potential, reaching a positive value of approximately 60%. However, the asset structure factor scores are all negative. This finding indicates that the pandemic did not significantly affect the development of these A-share listed companies. There is a commonality of weak stability in funding sources, making it challenging to achieve high short-term returns on stocks. It is suitable for cautious investors to hold in the medium- or long-term.

In the fourth category, the stocks have positive scores for operating income and development potential factors and negative scores for asset structure and solvency factors. Operating income represents the existing earnings of a business, suggesting that this category is suitable for confident investors seeking short-term profitability. It is recommended that investors hold forward contracts for this category of stocks, combined with the excellent performance of their growth potential.

In the fifth category, this group of stocks score particularly well on the development potential factor, whereas the other factors are relatively flat. Owing to the long duration of the impact of the coronavirus pandemic on China’s biopharmaceutical industry, the potential of this category is stable in the long term. This is suitable for long-term holdings of cautious investors.

In the sixth category, the stocks do not score well on the operating income, solvency, asset structure, and development potential factors; all are relatively “medium.” The distribution of stocks in this category is relatively even, and the risk profile is medium. Although there are some variations in share prices, stocks appear to be suitable for cautious investors.

In the seventh category, stocks stand out, with positive scores on the asset structure factor, whereas the other three factors have relatively flat scores. The leading asset structure indicator indicates the relative relationship between the funding sources provided by creditors and those offered by investors, reflecting the stability of the underlying financial structure of the business. Hence, stocks in this category are suitable for confident investors with a short-term horizon.

Sector investment analysis

We have offered stock-selection guidance for various investor profiles. However, translating these empirical findings into tangible economic benefits requires specific investment portfolio recommendations. Moving forward, we aim to devise optimized strategies for both long- and short-term investments, integrating the insights from Section “3.3” with return-risk analyses. In this study, we address the optimization challenge by employing Markowitz's model, using the Sharpe ratio (Sharpe 1998) as a performance metric. Markowitz’s model is designed to assist rational investors in maximizing returns for a given level of risk, or minimizing risk for a specified return level.

Long and short term portfolio

Suppose that investment P has a set of N variable assets in the market. Let \({r}_{p}\) be the expected rates of return and \({\updelta }_{p}\) be the risks. We will have expected return \({r}_{p}={\sum }_{i=1}^{N}{w}_{i}{r}_{i}\), where w is the weight factor with values between 0 and 1, and the variance \({\updelta }_{p}^{2}\left(w\right)={\sum }_{i=1}^{N}{\sum }_{j=1}^{N}{w}_{i}{w}_{j}{\updelta }_{ij}={w}^{T}\Delta w\), where \({w}_{i}\) and \({w}_{j}\) are the weights assigned to stock i and j respectively; ∆ is the covariance matrix of the stocks, and δij is the covariance between the stock price of i and j. Then we define:

$$\text{risk}: {\updelta }_{p}=\sqrt{{w}^{T}\Delta w}$$
$$\text{Sharpe Ratio}:\frac{{r}_{p}-{r}_{f}}{{\updelta }_{p}}$$

where rf is the risk-free interest rate. We choose benchmark 1-year and 5-year deposit rates as proxies. The optimization problem can be described as follows:

$$\text{Max }\frac{{r}_{p}-{r}_{f}}{{\delta }_{p}}$$
$$\text{s}.\text{t}.{\sum }_{i=1}^{N}{w}_{i}=1$$

We consider a mean–variance investor and define the quadratic utility function for investing in this portfolio (Wang et al. 2016) as:

$$U=E\left({r}_{p}\right)-\frac{1}{2}\upgamma var\left({r}_{p}\right)$$

where U is the mean–variance utility, \(E\left({r}_{p}\right)\) is the portfolio’s mean return, and \(var\left({r}_{p}\right)\) is the portfolio’s variance, which is a proxy for the portfolio risk. γ is the risk aversion coefficient, and we set γ to be 3 for the short-term investment and 6 for the long-term investment.

For long-term investment, we selected stocks from Categories 3 and 5, which include five stocks. Due to extensive missing data, stock 832,735, BJ (stocks from Beijing Exchange) was excluded from the portfolio analysis. We ran the optimization on the remaining four stocks: 300,573.SZ, 000538.SZ, 300,760.SZ, and 603,127. SH. We set rf as the benchmark five-year deposit rate for 2021 (2.75%). Optimization simulations used stock prices from the start of 2021 to the end of 2021. R programming was used to perform optimization according to the above formula. Figure 4a shows the Sharpe ratio curve of this portfolio against the expected return with a minimum α. For demonstration, we display the nearest ten results around the maximum point. As shown in Table 17, 0.03452 had the largest Sharpe ratio, and the weights of 000523. SZ and 300,760. The SZ in this portfolio is 0 with a weight of 300,573.SZ and 603,127.SH were 24.8% and 75.2%, respectively, with an expected return of 7.179% and a standard deviation of 1.283. Figure 4a shows the Sharpe ratio curve against the expected return with a minimum α. Table 17 also reports the mean–variance utility values for each portfolio. An optimal value of 4.7134 was achieved when the weights of 300,573.SZ and 603,127.SH were 26.99% and 73.01%, respectively, with an expected return of 7.148% and standard deviation of 1.274.

Fig. 4
figure 4

Sharpe Ratio curves of long- and short-term investment

Table 17 Long-term investment of stocks

For short-term investment, we selected the best stock from Categories 1, 4, and 7, which comprise 12 stocks in total. In this situation, we set α to be 1.75%, the one-year risk-free interest rate in 2021. Similarly, Fig. 4b shows the Sharpe ratio curve for the 12 stocks against the expected returns. Similar to Table 17, the optimization results with the nearest ten points are listed in Table 18. This table excludes the weights of 688,399.SH, 601,607.SH, 000661.SZ, 603,368.SH, 600,829.SH, 002524.SZ, 000028.SZ, 000411.SZ, and 002462.SZ because their weights are approximately 0.00% when a minimum of 0.0175 expected return is required. The remaining three stocks are 002821.SZ, 600,713.SH, and 600,129.SH. From Table 18, 0.05123 is the largest Sharpe ratio, and for this investment, the weights of 600,173.SH and 601,607.SH were both 0 and the weights of 002821.SZ and 600,129.SH were 23.88% and 76.12%, respectively, with an expected return of 8.133% and a standard deviation of 1.051. Table 18 also reports the mean–variance utility values for each portfolio. The optimal value of 4.8425 was achieved when the weights of 002821.SZ and 600,129.SH were 28.57% and 71.43%, respectively, with an expected return of 8.019% and standard deviation of 1.029.

Table 18 Short-term investment of stocks

To provide investment suggestions from another risk-return perspective, we calculated the portfolios (which have returns above the risk-free rate), Treynor Ratio (TR), and Jensen Alpha (JA); the optimal investment strategy using these two performance evaluation metrics is shown in Tables 19 and 20. The Treynor Ratio measures the risk premium earned per unit of systematic risk, whereas Jensen Alpha assesses investment performance by quantifying the deviation of a portfolio’s average return from its expected return, based on the Capital Asset Pricing Model. The formulae for these performance metrics are as follows:

$${\text{investment beta}:\upbeta }_{p}={w}_{1}{\upbeta }_{1}+{w}_{2}{\upbeta }_{2}+\dots .+{w}_{N}{\upbeta }_{N}$$
$$\text{Treynor Ratio}:\frac{{r}_{p}-{r}_{f}}{{\upbeta }_{p}}$$
$$\text{Jensen Alpha}: {r}_{p}-\left({r}_{f}+{\upbeta }_{p}\left({r}_{m}-{r}_{f}\right)\right)$$

where \({\upbeta }_{i}\) is the beta of the holding and rm is the average expected return of the market. From Table 19, the short-term optimal portfolios determined by the Treynor Ratio and Jensen Alpha differ considerably from those based on the Sharpe Ratio and mean–variance utility from Table 18. In Table 20, the long-term optimal portfolios determined by the Treynor Ratio and Jensen Alpha are consistent with the results in Table 17. In summary, the optimal long-term portfolio decisions do not vary significantly if different metrics are applied, whereas the optimal short-term portfolio can be completely diverse based on various portfolio profitability evaluation methods.

Table 19 Short-term investment with optimal TR and JA
Table 20 Long-term investment with optimal TR and JA

Robustness check

In addition to constructing long- and short-term portfolios, we conducted a robustness check by comparing our results in Section “Long and short term portfolio” and with traditional equally weighted portfolios. Specifically, for the short-term portfolio, we partitioned 2021 into four quarters and compared them for each quarter. Tables 21 and 22 display the robustness check results for the long- and short-term portfolios, respectively. From the tables, our results from Section “Long and short term portfolio” are clearly better than those from the equally weighted portfolio (EWP). Although EWP had a higher Sharpe ratio, our results lead to higher portfolio returns and utility.

Table 21 Robustness check results for the long-term portfolio
Table 22 Robustness check results for the short-term portfolio

Conclusion

This study employed factor and cluster analyses to examine 165 listed companies from 2018 to 2021. Based on these results, we assessed the overall performance of various biopharmaceutical sector stocks, estimated the impact of COVID-19 on different stock types, and offer recommendations to investors.

The main results of this study are as follows. First, the biopharmaceutical industry stocks maintained an upward momentum during the pandemic. Specifically, medical devices, medical services, biological products, and chemical pharmaceuticals increased, whereas Chinese medicine and pharmaceutical commerce declined. Unlike previous studies, we explored the dynamics of second-level classification stocks during the COVID-19 pandemic. Second, regarding the clustering results, we conclude: Category 1 stocks are suitable for short-term holdings by unconfident investors. Categories 2 and 3 stocks are medium to long-term holdings for cautious investors. The fourth category is ideal for confident investors investing in forward contracts. Category 5 is suitable for long-term holdings by cautious investors. Category 6 is suitable for cautious stockholders. Category 7 is ideal for short-term holdings by confident investors. We considered both the degree of risk aversion and holding period of stock investors. Finally, we developed an optimal investment strategy using the Sharpe ratio and mean–variance utility, and provided alternative performance metrics, including the Treynor Ratio and Jensen Alpha, for comparison. For long-term investments, we recommend that investors allocate 24.8% of their wealth in 300,573.SZ and 75.2% in 603,127.SH to achieve the highest Sharpe ratio. We suggest a portfolio allocation of 26.99% in 300,573.SZ and 73.01% in 603,127.SH for the best mean–variance utility. For short-term investments, we recommend a portfolio allocation of 23.88% in 600,173.SH and 76.12% in 601,607.SH for the highest Sharpe ratio, and 28.57% in 600,173.SH and 71.43% in 601,607.SH for the best mean–variance utility. We provide investment suggestions tailored to investors with different preferences for portfolio analysis metrics.

This study had some limitations. For example, this study focused only on the stock market during the pandemic. Other financial assets such as bonds and futures were not included. Second, the analysis was restricted to data from China, despite the global impact of COVID-19, suggesting the potential relevance of examining financial markets in other countries.

Future research could broaden the scope to include additional financial assets as the pandemic progresses in China. In addition, given the significant effects of the pandemic on countries such as the US and India, a comparative study of their biopharmaceutical sectors relative to China's may yield valuable insights. Furthermore, advances in research methodologies may allow the application of sophisticated machine learning and deep learning techniques, such as DBN (Deep Belief Network), LSTM (Long Short-Term Memory), and support vector machines, to explore nonlinear relationships in the Chinese stock market. While some studies, such as Leippold et al. (2022), have applied nonlinear machine learning methods to the Chinese stock market, few have specifically addressed nonlinear dynamics within sectors such as the biopharmaceutical industry.

Availability of data and materials

The datasets generated and/or analysed during the current study are available in Wind Database.

Abbreviations

COVID-19:

Coronavirus disease 2019

SOM:

Self organizing maps

KMO test:

Kaiser–Meyer–Olkin test

R&D expenditure:

Research and development expenditure

xxxxxx.SH:

Stock from Shanghai Exchange

xxxxxx.SZ:

Stock from Shenzhen Exchange

TR:

Treynor ratio

JA:

Jensen Alpha

EWP:

Equally weighted portfolio

ER:

Expected return

SD:

Standard deviation

DBN:

Deep belief network

LSTM:

Long short-term memory

References

  • AlAli MS (2020) The effect of who COVID-19 announcement on Asian Stock Markets returns: an event study analysis. J Econ Business 3(3):1051–1054

    Article  Google Scholar 

  • Ayati N, Saiyarsarai P, Nikfar S (2020) Short and long term impacts of COVID-19 on the pharmaceutical sector. DARU J Pharm Sci 28:799–805

    Article  Google Scholar 

  • Baker HK, Haslem JA (1974) The impact of investor socioeconomic characteristics on risk and return preferences. J Bus Res 2(4):469–476

    Article  Google Scholar 

  • Bai J, Ng S (2006) Evaluating latent and observed factors in macroeconomics and finance. J Econ 131(1–2):507–537

    Google Scholar 

  • Cevik E, Kirci Altinkeski B, Cevik EI, Dibooglu S (2022) Investor sentiments and stock markets during the COVID-19 pandemic. Financ Innov 8(1):69

    Article  Google Scholar 

  • Cox J, Greenwald DL, Ludvigson SC (2020) What explains the COVID-19 stock market? (No. w27784). National Bureau of Economic Research

  • Daglis T, Melissaropoulos IG, Konstantakis KN, Michaelides PG (2022) The impact of COVID-19 on global stock markets: early linear and non-linear evidence for Italy. Evol Inst Econ Rev 19(1):485–495

    Article  Google Scholar 

  • Esparcia C, López R (2022) Outperformance of the pharmaceutical sector during the COVID-19 pandemic: global time-varying screening rule development. Inf Sci 609:1181–1203

    Article  Google Scholar 

  • Gorman BS, Primavera LH (1983) The complementary use of cluster and factor analysis methods. J Exp Educ 51(4):165–168

    Article  Google Scholar 

  • Gupta V, Santosh KC, Arora R, Ciano T, Kalid KS, Mohan S (2022) Socioeconomic impact due to COVID-19: an empirical assessment. Inf Process Manag 59(2):102810

    Article  Google Scholar 

  • Hao Y, Liu H, Chen H, Sha Y, Ji H, Fan J (2019) What affect consumers’ willingness to pay for green packaging? Evidence from China. Resour Conserv Recycl 141:21–29

    Article  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (appl Stat) 28(1):100–108

    Google Scholar 

  • He P, Sun Y, Zhang Y, Li T (2020) COVID–19’s impact on stock prices across different sectors—an event study based on the Chinese stock market. Emerg Mark Financ Trade 56(10):2198–2212

    Article  Google Scholar 

  • Ho KC, Chen C, Yang D, Gao Y (2022) Medical reform, Covid-19 vaccine and stock returns: the case of Chinese listed pharmaceutical and healthcare companies. Appl Econ Lett 31:832–839

    Article  Google Scholar 

  • Jones CS (2006) A nonlinear factor analysis of S&P 500 index option returns. J Financ 61(5):2325–2363

    Article  Google Scholar 

  • Kim JO, Mueller CW (1978) Introduction to factor analysis: What it is and how to do it, vol 13. Sage, England

    Book  Google Scholar 

  • Leippold M, Wang Q, Zhou W (2022) Machine learning in the Chinese stock market. J Financ Econ 145(2):64–82

    Article  Google Scholar 

  • Liu H, Wang Y, He D, Wang C (2020) Short term response of Chinese stock markets to the outbreak of COVID-19. Appl Econ 52(53):5859–5872

    Article  Google Scholar 

  • Ludvigson SC, Ng S (2007) The empirical risk–return relation: a factor analysis approach. J Financ Econ 83(1):171–222

    Article  Google Scholar 

  • Morelli D (1999) Tests of structural change using factor analysis in equity returns. Appl Econ Lett 6(4):203–207

    Article  Google Scholar 

  • Nanda SR, Mahanty B, Tiwari MK (2010) Clustering Indian stock market data for portfolio management. Expert Syst Appl 37(12):8793–8798

    Article  Google Scholar 

  • Piñeiro-Chousa J, López-Cabarcos MÁ, Quiñoá-Piñeiro L, Pérez-Pico AM (2022) US biopharmaceutical companies’ stock market reaction to the COVID-19 pandemic. Understanding the concept of the ‘paradoxical spiral’ from a sustainability perspective. Technol Forecast Soc Chang 175:121365

    Article  Google Scholar 

  • Robke L, Pont LB, Bongard J, Wurzer S, Smietana K, Moss R (2020) Impact of COVID-19 on pharmaceutical external innovation sourcing. Nat Rev Drug Discov 19(12):829–830

    Article  Google Scholar 

  • Seong N, Nam K (2021) Predicting stock movements based on financial news with segmentation. Expert Syst Appl 164:113988

    Article  Google Scholar 

  • Sharpe WF (1998) The sharpe ratio. Streetwise Best J Portfolio Manag 3:169–185

    Article  Google Scholar 

  • Takyi PO, Bentum-Ennin I (2021) The impact of COVID-19 on stock market performance in Africa: a Bayesian structural time series approach. J Econ Bus 115:105968

    Article  Google Scholar 

  • Thukral E (2021) COVID-19: small and medium enterprises challenges and responses with creativity, innovation, and entrepreneurship. Strateg Chang 30(2):153–158

    Article  Google Scholar 

  • Vats P, Samdani K (2019) Study on machine learning techniques in financial markets. In: 2019 IEEE international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–5

  • Wagenvoort RJ, Ebner A, Borys MM (2011) A factor analysis approach to measuring European loan and bond market integration. J Bank Financ 35(4):1011–1025

    Article  Google Scholar 

  • Wagner AF (2020) What the stock market tells us about the post-COVID-19 world. Nat Hum Behav 4(5):440–440

    Article  Google Scholar 

  • Wang Y, Ma F, Wei Y, Wu C (2016) Forecasting realized volatility in a changing world: a dynamic model averaging approach. J Bank Financ 64:136–149

    Article  Google Scholar 

  • Wang YJ, Lee HS (2008) A clustering method to identify representative financial ratios. Inf Sci 178(4):1087–1097

    Article  Google Scholar 

  • Zou Z, Wang X (2023) Research on the investment value of China’s medical sector in the context of COVID-19. Econ Res-Ekonomska Istraživanja 36(1):614–633

    Article  Google Scholar 

Download references

Funding

XJTLU Postgraduate Research Scholarship-PGRS2012016.

Author information

Authors and Affiliations

Authors

Contributions

Study conception and design: Conghua wen; data collection: Yifang Tang, Feifan Zhao; analysis and interpretation of results: Jiahui Xi, Conghua Wen; draft manuscript preparation: Jiahui Xi, Yifan Tang, Conghua Wen. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Conghua Wen.

Ethics declarations

Competing interests

No competing interests exist for the publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Correlation matrix of financial indicators

In this section, we show the correlation coefficient matrix of financial indicators in 2018, 2019, 2020 and 2021.

See Tables 23, 24, 25, 26

Table 23 Correlation coefficient matrix of financial indicators in 2018
Table 24 Correlation coefficient matrix of financial indicators in 2019
Table 25 Correlation coefficient matrix of financial indicators in 2020
Table 26 Correlation Coefficient Matrix of Financial Indicators in 2021

.

Appendix B: Determination of the number of clusters

When performing cluster analysis, parameter calls are first taken to determine the number of clusters. The inflection point is often used as the basis for the selection of the number of clusters, and as the number of clusters K increases, the sample is divided more finely and the degree of aggregation of each cluster is gradually increased.

Taking K less than 4, the classification is relatively coarse and less referential. Taking K = 6, 141 out of 165 stocks belong to the same category and there are four subgroups with only a small number of stocks. This classification method leads to less referential results. Therefore K = 7 is taken.

The tables below show the clustering data when the number of clusters is 6 and 7.

See Tables 27, 28.

Table 27 Classification with a clustering number of 6
Table 28 Classification with a clustering number of 7

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xi, J., Wen, C., Tang, Y. et al. A factor score clustering approach to analyze the biopharmaceutical sector in the Chinese market during COVID-19. Financ Innov 10, 135 (2024). https://doi.org/10.1186/s40854-024-00654-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40854-024-00654-y

Keywords

JEL Classification