 Research
 Open Access
 Published:
Online riskbased portfolio allocation on subsets of crypto assets applying a prototypebased clustering algorithm
Financial Innovation volume 9, Article number: 25 (2023)
Abstract
Meanvariance portfolio optimization models are sensitive to uncertainty in riskreturn estimates, which may result in poor outofsample performance. In particular, the estimates may suffer when the number of assets considered is high and the length of the return time series is not sufficiently long. This is precisely the case in the cryptocurrency market, where there are hundreds of crypto assets that have been traded for a few years. We propose enhancing the meanvariance (MV) model with a preselection stage that uses a prototypebased clustering algorithm to reduce the number of crypto assets considered at each investment period. In the preselection stage, we run a prototypebased clustering algorithm where the assets are described by variables representing the profitrisk duality. The prototypes of the clustering partition are automatically examined and the one that best suits our riskaversion preference is selected. We then run the MV portfolio optimization with the crypto assets of the selected cluster. The proposed approach is tested for a period of 17 months in the whole cryptocurrency market and two selections of the cryptocurrencies with the higher market capitalization (175 and 250 cryptos). We compare the results against three methods applied to the whole market: classic MV, risk parity, and hierarchical risk parity methods. We also compare our results with those from investing in the market index CCI30. The simulation results generally favor our proposal in terms of profit and riskprofit financial indicators. This result reaffirms the convenience of using machine learning methods to guide financial investments in complex and highlyvolatile environments such as the cryptocurrency market.
Introduction
Blockchain technology is one of the most disruptive technologies in the last 30 years, with applications to many different domains where process transactions take place. As a result, blockchain has changed our view of contracts, logistics, and shipping, and has sparked academic research (Zhou et al. 2021). Finance (Zhao et al. 2016; Xu et al. 2019) is one of the fields where the impact of blockchain has been more important; in particular, the use of cryptocurrencies as trading assets, which in turn has raised considerable interest from academia (Fang et al. 2022).
Since the appearance of Bitcoin in 2008, the cryptocurrency market capitalization has grown to $1676bn in February 2022, where Bitcoin and Ethereum represent 41.8% and 18.1% of the market, respectively.^{Footnote 1}
While the market is only in its infancy, some studies highlight a Compound Annual Growth Rate (CAGR) higher than 21% for the next 5 years,^{Footnote 2} which makes it extremely attractive for investors. At the same time, its volatility is also extremely high, and the number of crypto assets traded is also very high at an estimated total of around 10,000 as of February 2022.^{Footnote 3} Most of those crypto assets have been traded for just a few years or even less.
While the market is attractive to investors, the aforementioned characteristics render wellknown portfolio optimization models, such as the MeanVariance (MV) methods (Markowitz 1952a, 1959) unreliable. In particular, the high number of potential crypto assets and their short time in the market may hinder the estimation of the covariance matrix.
Some studies use a predefined criterion for reducing the number of considered crypto assets, such as focusing on those with higher capitalization. However, such a strategy may leave aside potentially interesting assets for the investor. Thus, we propose the use of a clustering method to partition the cryptocurrency space and the automatic selection of the partition that best suits the riskaversion preference of the investor. In particular, we propose the use of a prototypebased clustering algorithm, such as KMeans or KMedois, as the prototypes will be used as representative elements of the partitions. The usefulness of such methods for characterizing the cryptoasset market in a meaningful way has previously been demonstrated in Lorenzo and Arroyo (2022). Following this work, we use the bivariate representation of the average and standard deviation of the returns together with a partitional clustering algorithm as a preliminary step before the portfolio optimization. After clustering, we select the cluster that best fits one of the predefined riskaversion profiles using the prototype as an adequate summary of the cluster. By doing so, our method reduces the number of crypto assets that will be considered by the portfolio optimization model, focusing only on those that best suit the investor strategy. Furthermore, our approach acknowledges the changing nature of the cryptocurrency market, as we repeat the process (clustering analysis, selection of the cluster, and portfolio optimization). The resulting number of clusters and their composition may change completely. In this way, our approach can be considered as an online portfolio selection methodology, where decisions are made sequentially incorporating the new information and it goes beyond the static or crosssectional use of clustering methods in other portfolio selection approaches.
We compare the results of our clustered MV model with those of the standard MV model applied to the whole market. In addition, we also compare it with those from more sophisticated methods such as the Risk Parity (Qian 2016), and the Hierarchical Risk Parity (HRP) methods (de Prado 2016). Furthermore, we also compare it against a buyandhold strategy of the CCI30 cryptocurrency index that represents the overall market behavior.
The simulation entails a test period of 17 months. We perform three experiments with a different set of cryptocurrencies: the whole cryptocurrency market with data available (over 500 cryptocurrencies), and two selections of the cryptocurrencies with the higher market capitalization (175 and 250 cryptocurrencies, respectively). For each experiment and each method, we repeat the simulation 1500 times using different investment paths, each one represents the simulation days that an investor considers for entering the market. If an investment is made on a given day, the position is held for the next 30 days. The different methods are compared using standard profit and riskprofit financial indicators.
The rest of this paper is organized as follows. A literature review is presented in “Literature review” section, in which we support our investigation on wellstated portfolio allocation models and different approaches for clustering of financial markets with more details for the cryptocurrency domain. The “Data and methods” section includes data processing, the methods followed, and the simulation carried out. Finally, we discuss our results and present concluding remarks in the “Results” and “Conclusions” sections, respectively.
Literature review
Portfolio selection
An investment portfolio is a basket of tradable assets and portfolio optimization models are concerned with finding the best combination of assets according to due objectives. There are two main schools of principles and theories for portfolio selection: (i) Markowitz’s Mean Variance models and (ii) Capital Growth Theory (Kelly jr 1956; Breiman 1960; Thorp 1975; Finkelstein and Whitley 1981). This research focuses on the first type of model. Modern Portfolio Theory (MPT), also referred to as MeanVariance Optimization models (MVO), was first posited in the 1950s by (Markowitz 1952a; Sharpe 1964; Lintner 1965) and considers the diversification of assets as the most effective way to obtain low riskreward ratios maximizing the expected utility of the returns. Diversification of capital helps to neutralize idiosyncratic risks. In this way, MVO links with the theory of rational behavior under uncertainty (Markowitz 1952b) and they are included in what are known as riskbased models. The portfolio allocation theory framework has been exponentially developed in very different works (Steinbach 2001; Kolm et al. 2014) to propose solutions to existing constraints on the different models mostly considering practical applicability to the markets. The models now include transaction costs, tax effects, estimation errors on the risk and return forecast, and intertemporal effects as edging conditions, and the inclusion of specific features required by financial planners. We can find an exhaustive taxonomy of MVO methods in Kalayci et al. (2019), all of which are focused on reducing risk while increasing diversification. One of the weaknesses of MVO models is that it is necessary to provide an estimation of the expected returns and covariances of all the securities in the investment universe; more details on criticism of MVO can be found in Michaud (1989) and Leland (1999). We use the acronyms MPT, MV, and MVO to refer to the same portfolio allocation model.
Risk Parity Portfolio (RPP) (Qian 2016; Roncalli 2013), also known as Equally weighted Risk Contribution (ERC) portfolio, together with MVO, belong to riskbased models. It is an approach to portfolio management that focuses on risk allocation rather than capital allocation. While the MVO methods minimize the variance, RPP models try to constrain each asset to contribute equally to the portfolio’s overall volatility (Maillard et al. 2010) and equalize risk contribution. In other words, it balances the risk so that the risk contribution of every asset is equal and in this sense, it is also considered a riskbased model.
Merging classical portfolio optimization models and hierarchical methods from unsupervised learning techniques detailed in the next subsection, the Hierarchical Risk Parity (HRP) model introduced by de Prado (2016) addresses the problems of traditional riskbased portfolios to compute a portfolio on an illgenerated or even a singular covariance matrix by conducting the optimization process by a topdown recursive bisection using graph theory and machine learning techniques. Based on the same idea, Raffinot (2017) proposed Hierarchical Clustering Based Asset Allocation (HCAA) that allocates capital within and across clusters of assets in multiple hierarchical models. The Hierarchical Equal Risk Contribution Portfolio (HERC) (Raffinot 2018) merges HRP and HCAA. Several variations to this approach have also been proposed (Lohre et al. 2020; Molyboga 2020). In particular, we use the HRP model as a benchmarking method to compare our proposal, as explained below. Moreover, our approach also tackles the problem of the covariance matrix. However, it focuses only on one cluster and in doing so, reduces the number of crypto assets and makes the computation and the inversion process of the covariance matrix easier.
An exhaustive study of the latest riskbased portfolio optimization strategies applied to the 30 highest market capitalization cryptos of the cryptocurrency market can be found in Burggraf (2019). An important strand of research is focused on the effects of adding cryptocurrencies to traditional asset portfolios (Eisl et al. 2015; Chuen et al. 2017; Petukhina et al. 2021; Culjak et al. 2022). Others, as in our proposal, are devoted exclusively to crypto markets. For example, Liu (2019) analyze the invertibility of selected top10 major cryptocurrencies demonstrating the benefits of the diversification for different portfolio optimization models as ERC, MV, RPP, maximum Sharpe ratio, and maximum utility.
We test the performance of our proposal in cryptocurrency markets using MV, RPP, and HRP models as benchmarks, which are among the most relevant in the portfolio allocation literature.
Clustering techniques in portfolio selection
The application of unsupervised models and in particular clustering techniques to find groups of assets characterized by their financial behavior arises with the seminal paper of Mantegna (Mantegna 1999) applying a hierarchical Minimum Spanning Tree (MST) that takes the linkage between stocks from the New York Stock Exchange market into account. While Mantegna’s methodology has been extensively applied, it has some drawbacks. A criticism of the initial Mantegna approach was related to the employed distance, based on a simple static correlation among the returns’ time series. Alternative approaches have been developed considering autocorrelation structure (Piccolo 1990), distances based on GARCH parameters (Otranto 2008), frequency domain features, higher moments of time series, and so on. Different alternatives have been proposed. Onnela et al. (2003) investigated the distribution and dynamics of correlation coefficients. For example, Bonanno et al. (2004) compared the return and volatility networks considering different time horizons. Tumminello et al. (2006) proposed Planar Maximally Filtered Graph (PMFG) graphs instead of MST. Brida and Risso (2009) combined symbolic time series analysis with MST. Musmeci et al. (2014) applied a new linkage method known as a Directed Bubble Hierarchical Tree (DBHT) to financial markets. From our viewpoint, Mantegna’s approach is a powerful methodology to determine the structures of the market in the context of a crosssectional analysis. However, we consider it difficult to adapt it to streaming data and online portfolios due to different constraints for instance with the sampling frequency (Bonanno et al. 2001) (e.g., intraday, daily, weekly), the length of the rolling window T (Onnela et al. 2003), and the number of assets N under study (Borysov et al. 2014). We later tackle the same challenge when N approaches the T value in portfolio optimization riskbased models. Marti et al. (2017) presents an exhaustive revision of hierarchical clustering in financial markets.
Another important strand of clustering in finance applies partitional prototypebased clustering to financial markets. D’Urso et al. (2013) and D’Urso et al. (2016) used a modelbased approach with different variations of fuzzy clusters and different distance metrics (autoregressive, Caiado). Iorio et al. (2018) proposed a clustering based on the computation of the spline coefficients of the time series and directly measured the performance within MV, Equally Weighted (EW), and ERC portfolio allocation models. Similarly, D’Urso et al. (2020) proposed a fuzzy clustering method based on cepstral representation, using the daily Sharpe ratio as a variable of clustering. Soleymani and Vasighi (2020) adapted a Kmeans to cluster NYSE stocks based on ValueatRisk (VaR) and Conditioned ValueatRisk (CVar) measures. Gubu et al. (2020) presents a robust portfolio selection using the KAMILA algorithm on a combination of continuous and categorical variables with a robust covariance estimation. Cerqueti et al. (2021) propose a clustering time series according to their estimated conditional moments via Autocorrelationbased fuzzy Cmeans (AFCM algorithm); the proposal is enhanced in Cerqueti et al. (2022), in which they computed an optimal weight for each moment. Both proposals are tested directly on different time series as empirical experiments.
Regarding the partitional clustering family, Nanda et al. (2010) applied Kmeans, fuzzy Cmeans, and SelfOrganizing Maps (SOM) to returns and financial ratios from Indian stocks to classify them into different clusters and subsequently develop portfolios. The analysis considers a set of stocks with fixedweight allocation along the investing period. However, the approach does not use outofsample data and the study is not replicated over time to investigate how the clusters and the results evolve. Nguyen Cong et al. (2014) proposed another precedent, which combines a stage of clustering using return and standard deviation variables and a multiobjective portfolio optimization allocating stocks from the different clusters using a genetic algorithm. The simulation is carried out using 570 stocks from the Stock Exchange of Thailand (SET) and identifies four clusters. Again, the number of clusters does not change over time. Datta and Ghosh (2015) propose an approach that groups the daily Indian market volatility by comparing Kernel Kmeans, SOM, and Gaussian clustering models to achieve the right volatility prediction using the clusters as predictors. Luca and Zuccolotto (2017) propose a dynamic clustering procedure for timeseries returns using the timevarying tail dependency as a dissimilarity measure. The aim is to provide a criterion for portfolio selection focusing on the lower tails of the returns distributions that are sensitive to the contagion phenomena between stocks for the FTSE MIB index. Duarte and De Castro (2020) segment the assets of the Brazilian Stock Exchange (B3) into partitional clusters of correlated assets taken as initial medoids of those assets with the lowest standard deviation of the past series of prices that feed an MV model and compared the performance with the RPP model. Instead, our approach is not based on any correlation measure but on a Euclidean distance defined on the volatilityreturn space.
Clustering techniques applied to the cryptocurrency market
Regarding the application of clustering methods to the cryptocurrency market, Song et al. (2012) applied Mantegna’s initial ideas based on hierarchical clustering but renewed for the emerging crypto market. Similarly, Stosic et al. (2018) use clustering to characterize the cryptocurrency market using the correlations of 110 cryptocurrencies and detect hierarchical structures using the MST. Song et al. (2019) also applied MST but removed the influence of BitcoinEthereum to avoid a highlycorrelated matrix. Lorenzo and Arroyo (2022) applied three different prototypebased clustering techniques to conduct a crosssectional analysis of the cryptocurrency market and identify associations between the clusters and several financial and technological descriptors. Each clustering method deals with the cryptocurrencies represented in a different way. Namely, a representation as two variables of the average and standard deviation of the daily returns, the distribution of daily returns, and the daily return time series.
Our approach uses clustering, but contrary to other works, it uses a sliding window, allowing for the number of clusters and their composition to change over time and automatically deciding the number of clusters by combining several validity indexes using a voting mechanism.
Online portfolio selection
We adopt an approach that fits into the Online Portfolio Selection (OLPS) models. The main characteristic of such portfolios is that it sequentially select a portfolio over a set of assets to achieve certain targets. In OLPS, market information arrives sequentially and the allocation decision must be made immediately. An exhaustive survey can be found in Li and Hoi (2014), which is complemented in Li et al. (2016) with an opensource MATLAB library to apply different online algorithms.
There are two types of methodologies in the OLPS literature: (i) Batch learning where the model is trained from a batch of training instances and (ii) Online learning where the model is successively trained from a single instance taking the price change \((x_{t,i}=\frac{p_{t,i}}{p_{t1,i}})\) as an input vector. Our research is focused on the continuoustime MV model developed for multiple period (batch) portfolio selection for the control part and it is analytically resolved in Li and Ng (2000) and Dai et al. (2010). Our proposal suits the batch approach because it is based on the deterministic management of historical data for the portfolio selection where there is not any dependency on the allocation decisions between different time frames. In addition, the target into each iteration for every investing window is a mean reversion formula inspired by the Online Mean Average Reversion (OLMAR) methods (Li and Hoi 2012; Li et al. 2015; Umino et al. 2022).
Jiang and Liang (2017) proposes an online portfolio approach in cryptocurrency markets. In particular, they propose a deterministic deep reinforcement learning based on a Convolutional Neural Network (CNN) applied to a training window of the historic price changes. The weight on the allocation is based on a reward function that maximizes the portfolio value although only for 12 cryptocurrencies. There are some similarities between this work and our approach since both use parameter tuning based on backtesting trading, and both combine machine learning and portfolio allocation. However, we first apply a clustering technique instead of the more complex CNN. Second, we use the classical portfolio allocation model MV instead of a reward function that does not consider any aspect of riskaversion on the investor. Third, we apply our method to 534 cryptos instead of just 12.
Another relevant reference is Khedmati and Azin (2020), who presents an online selection algorithm based on the pattern matching principle where it uses Kmeans, kmedoids, spectral, and hierarchical clustering for the selection of the best investing time window.
Market efficiency
Market efficiency is a key financial subject that the researchers try to transpose from the classical to the cryptocurrency domain. Starting with the seminal works of Fama (1965) and Samuelson (1965) on traditional financial markets, there have been different attempts to understand the applicability of efficiency to the new markets. Kyriazis (2019) conducts a systematic survey on the predictability of cryptocurrency prices and concludes that the Efficient Market Hypothesis (EMH) is rejected, opening a door to speculation, a conclusion that we partially confirm in our investigation. Makarov and Schoar (2020) comes to the same conclusion but analyzes arbitrage opportunities for the price deviation across the different exchanges. One of the major implications of the inefficiencies of the cryptocurrency markets is that they offer investment opportunities to portfolio management of making excess returns based on outperforming the market (Palamalai et al. 2021). We find different examples of how we can take advantage of such inefficiencies by applying different machine learning techniques. For instance, Alessandretti et al. (2018) applied different forecasting models achieving profits over the investing period and performing better than a baseline strategy. The parameter optimization based on the Sharpe ratio achieves the best results, which is one of the strategies that we analyze herein. Livieris et al. (2020) ensemble different learning strategies that exhibit high efficiency and reliability mainly for lowfrequency applications. Fang et al. (2021) analyze a datadriven approach with a retraining method to predict successful midprice movements in cryptocurrency markets. The disadvantage of the learning algorithms that take advantage of the market inefficiency is that the methods are datahungry (Marcus 2018) and the forecasting benefit decay in nonstationary time series. Finally, Sebastião and Godinho (2021) analyze the predictability of three important cryptocurrencies, Bitcoin, Ethereum, and Litecoin, using several machine learning methods and compare their profitability incorporating trading costs. The results indicate that it is possible to propose profitable trading strategies in the cryptocurrency market, even under adverse market conditions, providing an example of the market inefficiencies.
Data and methods
Dataset and preprocessing
From Cryptocompare exchange,^{Footnote 4} we retrieve the daily closing price for all the cryptocurrencies traded from January 1, 2018, to May 31, 2021, for a total of 1,999,953 market observations along 1247 trading days.
For each cryptocurrency, we transform the price time series into the arithmetic return time series, whose use is extended and consolidated due to its more suitable statistical properties and better comparability (Gilli et al. 2019). The arithmetic returns for the cryptocurrency i at time t are:
where \(P_i(t)\) is the daily cryptocurrency price for cryptoasset i at day t and T is the time series sampling.
Regarding data cleaning, we remove the observations with duplicated rows and NaN or Inf values for \(R_{t,i}\).
Furthermore, we filter out the cryptocurrencies with heavier tails in the return distribution because it implies relatively frequent extreme price fluctuations and affects the consistency of the results, particularly the estimation of the returns and covariance matrix for the portfolio optimization.
Heavytail behavior in a return distribution is related to the finitesize effects in the number of active agents linked to the liquidity and volume of the market (Watorek et al. 2020). According to Newman (2005), a distribution has a heavytail behavior if the tail index is lower than 2. In our case, we discard cryptocurrencies with heavier tails, that is, those with a tail index higher than 2.3. In this way, we ensure the existence of the twomoment expectation and covariance matrix required for riskbased models. We apply this filter only to the first twoyears of data, that is, before the simulation starts. In this way, we avoid the lookahead bias. The results are reported in Table 1.
Stationarity is another important property of the underlying process in a return time series, especially if we are interested in forecasting or pricing. Random walk theory allows us to test the weak form of the EMH (Samuelson 1965; Fama 1965) within a series of asset returns. The key principle of EMH is that asset prices reflect all information, making it impossible for investors to derive benefits through trying to predict their behavior. The weak form of efficiency can be expressed as an autoregressive random walk model of stock returns:
From Eq. 2, the stock return series, \(R_t\), is considered a random walk only if \(\rho = 1\), whereas if \(\rho  < 1\), then the series is a stationary and predictable process, which violates the weakform EMH. We apply the Kapetanios, Shin, and Snell (KSS) nonlinear unit root test (Kapetanios et al. 2003), which is more robust when there are market frictions (i.e., transaction costs) as it is proposed in Apopo and Phiri (2021) for cryptocurrency markets. We use the test implementation in the R package by Guris (2021). The null hypothesis is that the raw time series of log returns is a randomwalk type against the alternative of a stationary process. Results applied to the 201819 window are reported in Table 1 and we demonstrate that we cannot discard the null hypotheses with a pvalue higher than 0.01 in approximately 50% of the cryptos. These results are aligned with others in the cryptocurrency market (Kyriazis 2019).
The resulting number of eligible cryptocurrencies for portfolios is 534. Additionally, we also consider two smaller sets with 250 and 175 cryptocurrencies with the highest market capitalization among the eligible ones.
Simulation
In this section, we describe our methodology. We perform a Monte Carlo experiment repeating each simulation 1500 times to better assess the outcome of the different investment methods considered. The simulation period is 17 months, from January 2020 to May 2021. In each simulation, we run a sequence of investments known as a Random Investment Path (RIP) that consists of a sequence of \(t_w\) days where it will be considered entering the market.
The RIPs are the same for all the investment methods under analysis: the ones proposed and those used for benchmarking. For each investment at time \(t_w\), we consider a 2year estimation window (730 days) from \(t_{w729}\) to \(t_{w}\), which is used to estimate the portfolio. If an investment is made, the investing window will always be held for 30 days, that is, from \(t_{w+1}\) to \(t_{w+30}\).
The investment path is created as follows. For each of the 17 months of the simulation period, there is a 50% probability of investing in that month. If the month is selected, we randomly select the day \(t_w\) when the investment will be made. Since the holding period of an investment is 30 days, we ensure that there is no overlap between the holding period of \(t_w\) and the next investment day \(t_w+1\).
The flowchart in Fig. 1 summarizes the investment process, which consists of the following steps:

1
Data selection As explained, for each investment time \(t_w\), we use a twoyear sliding window for the estimation of the optimal portfolio, and 30days as the holding period for the investment.

2
Market segmentation At this stage, we use a prototypebased clustering algorithm to segment the market. In particular, we use a kmedoids algorithm and a quality index to automatically set the k value. We repeat the process 50 times to remove the uncertainty of the initialization of the algorithm. Each partition is denoted as \(P_n\) in the chart.

3
Prototype selection strategy Given the prototypes of the 50 segmentations of the previous step, we apply a heuristic to select the most suitable cluster for later portfolio optimization. In particular, we consider four different strategies, three of which are related to risk measures and another based on the wellknown Sharpe ratio. If no prototype matches the strategy requirements, then no cluster is selected and no investment will be made at \(t_w\). For a given strategy and an investment time \(t_w\), we select a cluster i of the partition \(P_n\) denoted as \(C_{w,i}^{P_n}\).

4
Portfolio allocation The MVO method is applied to the cryptocurrencies that belong to the cluster \(C_{w,i}^{P_n}\) selected for each strategy once the risk filter is applied to remove extremely volatile cryptocurrencies. It may be the case that the portfolio optimization produces no result, in such case, no investment is made. The reason behind this fact can be due to the covariance matrix being nonsymmetric positive definite and hence not invertible or because the optimization problem is illconditioned and it does not find a solution. It should be noted that the quadratic function to minimize in our model as defined in Eq. 8 is convex and reaches a global optimum solution if and only if the covariance matrix is semidefinite positive.
Before applying the portfolio optimization method, we apply a risk filter to remove the cryptocurrencies with extreme volatility in the last month.

5
Performance assessment This last step is carried out once the investment path is executed. We then measure the performance of the MV model with four of the proposed strategies and of the benchmarks. The performance is measured using profit, risk, and profitrisk indicators. As benchmarks, we use wellknown portfolio optimization models over all the cryptocurrencies, that is, with no market segmentation and prototype selection. In particular, the meanvariance (MV), the risk parity (RPP), and the hierarchical risk parity (HRP). In addition, we also apply the random investment paths on the market index CCI30 to compare the strategies against the main market trend.
Sampling strategy and data selection
Our dataset ranges from 1st January, 2018 to 30th May, 2021. For each investment at time \(t_w\), we consider a 2year estimation window (730 days) from \(t_{w729}\) to \(t_{w}\); this data is used to estimate the portfolios. The investing period is 30 days, that is, from \(t_{w+1}\) to \(t_{w+30}\). At this stage, we apply the socalled filter risk to remove cryptocurrencies with extreme volatility in the last month of the estimation window from those considered. In particular, we remove those with \(\sigma >1\) in the last 30 daily returns. We also remove cryptocurrencies with missing data or that were no longer traded during the estimation period.
Market segmentation
First, we describe each cryptocurrency in our dataset using the average and standard deviation (\(\sigma , \mu\)) of the daily returns (Eq. 1) computed along the estimation window. This representation is used later for the automatic selection of the cluster, according to the investment strategy and is also consistent with the MV portfolio optimization. In addition, it succinctly summarizes the profitability and volatility of each asset and has been successfully used for clustering cryptocurrencies (Lorenzo and Arroyo 2022). While more sophisticated representations are previous reference, the (\(\sigma , \mu\)) variables make a faster computation possible, which is crucial due to the intensive calculations of the simulations.
For market segmentation, we use a partitional prototypebased clustering algorithm. We need the algorithm to produce a partition of disjoint subsets of assets (cryptocurrencies in our case) and we need a prototype representing each subset for some of the investment strategies explained below.
In clustering, the prototype represents the cluster elements optimally, that is, it typically minimizes the total distance between all the cluster objects and itself. The process is usually (Henning et al. 2016) formalized as the minimization of \(S({\mathcal {D}}, m_1, ..., m_k)\) as follows by choice of the prototype \(m_1,...,m_k\),
where n in Eq. 3 is the total number of objects in the space \({\mathcal {D}}\), d is the dissimilarity measure function, and K is the number of clusters. The prototypes \(m_1,...,m_k\) may be required to be objects in \({\mathcal {D}}\). d may be the given distance between the observation \(x_i\) and the cluster centroids or prototypes \(m_c\).
We use Euclidean Distance (ED) \(d^2({{\textbf {x}}}_i, {{\textbf {x}}}_j) = \Vert {{\textbf {x}}}_i  {{\textbf {x}}}_j \Vert ^2\) where \(\Vert \cdot \Vert\) is the Euclidean norm in \({\mathbb {R}}^n\) because it is both simple and meaningful where the resulting space meets the appropriate mathematical properties. Furthermore, it has also been used in a similar application for prototypebased clustering of cryptocurrencies represented as the average return and volatility (Lorenzo and Arroyo 2022; Mattera et al. 2021; Nguyen Cong et al. 2014), as in our case.
In some clustering algorithms, for example, in the Kmeans, the prototype is the mean of the objects. However, we want the prototype to be an observed object, so we use a Kmedoids or Partition Around Medoids (PAM) algorithm.
Owing to the intensive use of the clustering algorithm in our simulations, we use a computationallyefficient version of the PAM algorithm called CLARA(Clustering LARge Applications). The difference is that these algorithms use only random samples of the dataset (instead of the entire dataset) to compute the medoids. However, it is important to note that the resulting partition includes all the elements of the dataset.
The CLARA algorithm belongs to the family of prototypebased clustering algorithms (Kaufman 1986; Kaufman and Rousseeuw 1990). It is a PAM algorithm adapted to large datasets. Readers interested in an indepth analysis of the PAMCLARA/CLARANS algorithm can refer to the work by Schubert and Rousseeuw (2019).
In the CLARA algorithm, we have to determine three parameters: the size of the random sample sampsize, the minimum cluster cardinality, and the number of clusters K.
In our case, sampsize vary in each execution from 50 to 100, so 50 partitions are generated. These are the aforementioned executions of the clustering algorithm. In each execution, the resampling changes and therefore slightly changes the outcome of the clustering.
For each sampsize iteration, we save all the clusters with a cardinality higher than 10 ensuring that the cluster is sufficiently large to run the portfolio allocation algorithms efficiently.
There has been extensive research on cluster detection and evaluation (Kou et al. 2014; Li et al. 2021). In this case, we rely on an automatic process that consists of computing different Cluster Validity Indices (CVIs) for crisp partitions (Arbelaitz et al. 2013), including Silhouette, Dunn, COP DaviesBouldin, CalinskiHarabasz, or the score function, and then apply the majority rule to select the number of clusters K that is best according to more CVIs.
The outcomes of clustering algorithms for different runs are groups or clusters of cryptocurrencies, each represented by a prototype. In the next subsection, we explain how we select the cluster that is used to optimize our portfolio.
Prototype selection strategies
Our method proposes clustering as a way to partition the cryptocurrency market according to the financial behavior of the cryptocurrencies. Once we have the partitions, we need a criterion to select the portion of the market that most interests us. At this point, the medoids or prototypes of the clusters will be the inputs for a simple heuristic algorithm of cluster selection according to the different investing strategies. Only clusters with cardinality equal to or higher than 10 are considered interesting for the investor. These cardinality criteria are applied to ensure that the portfolio allocation algorithms work more efficiently. The strategies that are described next are summarized in Table 2.
Strategy 1: Sharpe ratio. The Sharpe ratio is the average excess riskfree return by volatility unit or total risk. The ratio determines the risk of the investment concerning the return of an investment with zero risk:
where \(r_P\) in Eq. 4 is the portfolio return, \(r_f\) is the riskfree rate and \(\sigma _P\) is the portfolio risk (standard deviation or the volatility of the portfolio). For \(r_f\) reference, we consider the daily of the annualized TBill over 90 days obtained from the Federal Reserve Economic Database (FRED) hosted by the Federal Reserve Bank of St. Louis. The greater the value of the Sharpe ratio, the more attractive the riskadjusted return of the portfolio.
In this strategy, we compute the annualized Sharpe ratio of the MV portfolios considering the cryptocurrencies during the estimation window. This strategy is represented with the Sharpe ratio (SR) label in the following tables and charts.
Strategy 2: Prototype selection by a riskaversion criteria strategy. First, we compute the volatility (\(\sigma\)) for all cryptocurrencies traded in the estimation window and compute the quartiles of the distribution that will serve as a reference for the volatility of the period and allow us to classify the cluster prototypes. Accordingly, we consider three different riskaversion profiles for investors following (Goetzmann et al. 2014). In particular, the profiles are:

1.
Strategy 2a represents the utility function of a riskaverse investor as it chooses the prototypes whose volatility is within the \(1^{st}\) quartile. Represented by the LowRisk (LR) label in the following tables and charts.

2.
Strategy 2b represents the utility function of a riskneutral investor as it chooses the prototypes that are between the \(2^{nd}\) and the \(3^{rd}\) quartile of the volatility distribution. Represented by the MeanRisk (MR) label.

3.
Strategy 2c represents the utility function of a riskseeking investor as it chooses the prototype over the \(3^{rd}\) quartile of the volatility distribution. Represented by the HighRisk (HR) label.
Since the prototype is described by two variables of average return and volatility, if more than one prototype is selected, we choose the one with the highest average return.
Portfolio allocation
In our proposal, once we select the most suitable partition according to our strategy, we run portfolio allocation using the wellknown MV optimization. One of its drawbacks is its tendency to maximize the effects of errors in the input assumptions on return and volatility estimations. This means that small changes in the expected returns or the computed covariance matrix can produce very different results.
Meanvariance model assumptions
We summarize some of its main assumptions below for a singleperiod MV model (Steinbach 2001):

1.
The existence of the twomoments expectation (\(\bar{{{\textbf {r}}}}\)) and covariance matrix (\(\Sigma\)) being the apostrophe (\('\)) transpose vector:
$$\begin{aligned}{}&\bar{{{\textbf {r}}}}:={\textbf{E}}({{\textbf {r}}});{} & {} \Sigma :={\textbf{E}}[({{\textbf {r}}}\bar{{{\textbf {r}}}})({{\textbf {r}}}\bar{{{\textbf {r}}}})']={\textbf{E}}[{{\textbf {r}}}{{\textbf {r}}}']\bar{{{\textbf {r}}}}\bar{{{\textbf {r}}}}' \\ \end{aligned}$$(5) 
2.
The returns (r) are assumed to follow a normal distribution

3.
The investors have MV preferences and thus ignore skewness
where r is a return vector. We also consider the following definitions:

Definition 1.1 (reward). The reward (\(\gamma\)) of a portfolio is the mean if its returns (r) by the corresponding weights (w)
$$\begin{aligned} \gamma ({{\textbf {w}}}) := {\textbf{E}}({{\textbf {r}}}'{{\textbf {w}}})=\bar{{{\textbf {r}}}}' {{\textbf {w}}} \end{aligned}$$(6) 
Definition 1.2 (risk). The risk (R) of a portfolio is the variance of the returns
$$\begin{aligned}{}&R({{\textbf {w}}}):=\sigma ^2({{\textbf {r}}}'{{\textbf {w}}})\\&={\textbf{E}}[({{\textbf {r}}}'{{\textbf {w}}}{\textbf{E}}({{\textbf {r}}}'{{\textbf {w}}}))^2]\\&={\textbf{E}}[{{\textbf {w}}}'({{\textbf {r}}}\bar{{{\textbf {r}}}})({{\textbf {r}}}\bar{{{\textbf {r}}}})'{{\textbf {w}}}]\\&={{\textbf {w}}}' \Sigma {{\textbf {w}}}\end{aligned}$$(7)
MV models require a risk measure as we see in Eq. 7 computed as a covariance matrix that at some point in the optimization process must be inverted, for which certain properties in the matrix are necessary; otherwise the matrix may not be invertible and the solution obtained may have too much error. The presence of noise in the series (Pafka and Kondor 2003) and the requirements of the covariance estimator itself force us to take care when optimizing the portfolio (Ledoit and Wolf 2003). In general, the sample covariance matrix is considered suitable for applications where its inverse is not required (Gatheral 2008), so there is a problem with riskbased portfolio selection because when the matrix is inverted, the noise is amplified (Ledoit and Wolf 2004). The sample covariance matrix contains substantial statistical noise that is amplified when it is inverted and in the same way, since the return matrix contains noise, the former may not estimate the true covariance matrix. An understanding of the estimation error of the covariance is important if we want to ensure better outofsample performance of the optimization model. The problem arises from the fact that the covariance matrix is calculated over a finite window length T, with T being the sampling of the time series and this inevitably leads to the appearance of noise (measurement error) in the estimator itself; this effect is greater as T approaches the value of N, the number of timeseries. This is something that must be taken into consideration by applying appropriate estimators (we use cov.trob function in R MASS package (Venables and Ripley 2002)) given the narrowness of the time window (T) and considering the high number of cryptocurrencies (N). From matrix theory, the condition number of a matrix A provides a measure of the sensitivity of the solution x of the system \(Ax=b\) to perturbations in b. In many situations with time series, the illconditioned matrix is caused because \(N>T\). Even when \(T > N\), the eigenstructure tends to be systematically distorted unless \(T \gg N\), resulting in a numerically illconditioned estimator for \(\Sigma\). From a different perspective, a strong correlation between some time series corresponds with a rank deficiency as well as with nonunique solutions in MV optimization. Hence, for the stability of the solution of a riskbased model, we definitely need invertible and wellconditioned covariance matrices.
We expect that we can enhance the performance of the portfolios thanks to a better estimation of the covariance matrices by reducing the cryptocurrency space by selecting the partitions according to investor goals.
Meanvariance (MV) model
The optimization goal for the MV model is to determine the best tradeoff between return and risk, subject to a set of constraints assuming that the investor knows the value of the expected return vector \(\mu\) and covariance matrix \(\Sigma\). Rational investors always pursue the lowest risk under a specific expected return or the highest return under a particular risk. The risk measure developed by Markowitz is an assetweighted covariance matrix, \({{\textbf {w}}}' \Sigma {{\textbf {w}}}\), where \(\Sigma\) in Eq. 8 is the covariance matrix and w is the portfolio weights vector. The optimization solution is obtained by setting a target portfolio return \({\bar{r}}\) discounting transaction costs aligned with the model proposed by Wang et al. (2014) but considering, in our case, a fixed amount per portfolio, allowing only long positions and full invested conditions with maximum and minimum holding sizes \(\omega \in [0.001, 0.5]\), such that:
where \({\hat{\mu }}\) is the estimated mean return vector of the cryptocurrencies computed on historical return values and \({\bar{\mu }}\) is the target required return. After portfolio optimization, we invest the remaining budget not allocated in cryptos due to portfolio weight constraints in riskfree security (\(x_f\)).
The vectors \({{\textbf {w}}}_{min}\) and \({{\textbf {w}}}_{max}\) in Eq. 8 are the hold lower and upper position bounds where \({{\textbf {w}}}=[\omega _0 , \omega _1, ..., \omega _N]'\). Under a basic constraint, the weights for allocated assets in the portfolio model Eq. (8) lie between 0 and 1 (long positions), and they sum up to 1 (fully invested portfolio). TC in Eq. 9 is the Transaction Cost as defined in Eq. (9).
In the presence of market friction, there are transaction costs paid by the investor to trade on the market.^{Footnote 5} For the MV model, transaction costs are computed as follows,
where N is the number of cryptocurrencies allocated into the portfolio and \(C_i\) is the Costs (C) expressed in basis points (\(1 \, bps = 1 / 100 \% = 1 / 10000\,{\$}\)). In our case, for computation of Eq. 9, we simply consider 5 bps per portfolio, which transforms Eq. 9 in a constant \(\gamma\).
The following is an explanation of the key terms in Eq. 8:

The targeted portfolio return (\({\bar{\mu }}_{min}\)) is computed based on the MAR of the last 30days of historical returns for the \(H \times n\) matrix \({{\textbf {R}}}\) where n is the variable number of cryptoassets allocated into the portfolio during the holding period H. The assumption is that the market will behave during the holding period at least as well as the last 30 days taken from the estimation window
$$\begin{aligned} {\bar{\mu }}_{min} = {\mathbb {E}}({{\textbf {R}}}_w)_{30d}= {{\textbf {w}}}' {\mathbb {E}}({{\textbf {R}}})_{30d} = ({{\textbf {w}}}' \mu )_{30d} \end{aligned}$$(10) 
The portfolio variance (\(\sigma\)) for the objective function
$$\begin{aligned} \sigma _w = Var({{\textbf {R}}}_w)=\sum _{i,j}Cov({{\textbf {r}}}_i,{{\textbf {r}}}_j)w_i w_j = {{\textbf {w}}}' \Sigma {{\textbf {w}}} \end{aligned}$$(11)
A Quadratic Program (QP) is an optimization problem whose objective is to minimize or maximize a quadratic function subject to a finite set of linear equality and inequality constraints. QP models are applied for solving many problems including most of meanvariance models Markowitz (Cornuejols and Tütüncü 2006) where \(\Sigma\) is part of the objective function as in Eq. 8. The R package selected to solve the quadratic programming problem is quadprog that implements the dual method of Goldfarb and Idnani (1983).
The optimal solution to 8 is a weight vector \({{\textbf {w}}}\) that will produce the optimal portfolio financial return (\(r_p\)) at time t when applied to the allocated cryptocurrencies:
where r is a \(n \times 1\) vector cryptoasset returns for n crypto assets allocated for the portfolio, and \(w_i\) is the weights of the cryptoasset i with a return \(r_i\). In our case, considering the aforementioned maximum and minimum holding sizes, we only consider those cryptocurrencies with weights strictly higher than 0.001 and lower than 0.5, and consider a freerisk asset until 100% of the investment is complete. In this way, we aim to obtain portfolios with a reasonable cardinality.
The cumulative portfolio return (Rp) at the end of the holding period H days applying Eq. 12 is
We evaluate the performance of the investment methods using different return indicators based on Eq. 13 throughout the investing period. Regarding profit indicators, we use the arithmetic cumulative return (\(r_A=\sum _{t=1}^{T}Rp\)) and the geometric compounding return (\(r_G = ( \prod _{t=1}^{T}(1+Rp) )  1\)); for the arithmetic average or average return per period (e.g., one year), we have (\({\bar{r}}_A=\frac{1}{T}\sum _{t=1}^{T}Rp\)) and the geometric average return (\({\bar{r}}_G = ( \prod _{t=1}^{T}(1+Rp) )^{\frac{1}{T}}  1\)), the compound annual return or annualized return for the annualized returns (\(r^{ann}_G = ( \prod _{t=1}^{T}(1+Rp) )^{\frac{n}{T}}  1\)), and the annualized arithmetic average return (\(r^{ann}_A=\frac{n}{T}\sum _{t=1}^{T}Rp\)). In all cases, Rp is computed based on Eq. 13, T is the number of periods under analysis and n is the number of periods within the year (monthly \(n=12\)). Arithmetic ratios reflect the additive relationship and geometric ones reflect compounding relationships. Compounding rates apply to investors that reallocate the funds obtained after one investment period into the next one.
Benchmarking of different portfolio allocation models
We benchmark the proposed investing strategies with other portfolio allocation models well stated in the financial literature applied to the whole market.
In addition, we also compare the performance of the methods against investing in the market index CCI30, a weighted market cap index launched on January 1, 2017. This price index is a weighted average of the 30 largest cryptocurrencies by market capitalization, and it is a good representative of the market’s overall growth and daily and longterm movement. We briefly present the HRP and the RPP models that are riskbased portfolio methods below.
Risk Parity Portfolio (RPP). A key concept in RPP model is the Marginal Risk Contribution (MRC) defined as follows:
where \(\omega _i\) in Eq. 14 is the weight of the cryptocurrency i, \(\sigma _P\) is the portfolio volatility, and \(\sigma _{i,j}\) is the covariance between crypto i and j. The Total Risk Contribution (TRC) of the ith cryptocurrency to portfolio risk is
Hence, the portfolio risk is computed as follows:
For RPP model implementation, we apply the R package riskParityPortfolio (Cardoso and Palomar 2021; Feng and Palomar 2015).
Hierarchical Risk Parity (HRP). This model merges hierarchical clustering and portfolio allocation procedures and is based on three main steps:

Step 1: It determines the hierarchical relationships between the assets using the recursive cluster formation scheme Hierarchical Tree Clustering algorithm. Specifically, the algorithm calculates tree clusters based on the \(T \times N\) matrix of asset returns, where T represents the number of samples for a due time frame and N is the number of assets. The correlationdistance matrix D, where \(\rho _{i,j}\) is the correlation between time series i and j, is as follows
$$\begin{aligned} D(i,j) = \sqrt{0.5 \times (1\rho (i,j))} \end{aligned}$$(17)and Eq. 17 is transformed in \({\hat{D}}\) by taking the ED between all the columns in a pairwise manner as follows
$$\begin{aligned} {\hat{D}}(i,j)=\sqrt{\sum _{k=1}^{N}(D(k,i)D(k,j))^2} \end{aligned}$$(18) 
Step 2: QuasiDiagonalization, which is a seriation algorithm that rearranges the data to show the inherent clusters more clearly. The algorithm rearranges the rows and columns of the covariance matrix of assets so that similar investments are placed together and dissimilar investments are placed apart.

Step 3: Recursive bisection is a topdown approach to split portfolio weights between subgroups obtained by recursively bisecting the rearranged covariance matrix from the second step based on inverse proportion to their aggregated variances.
For HRP model implementation, we apply the R functions available in https://rdrr.io/github/jackylauu/hierarchicalPortfolios/src/R/HRP.R.
Performance assesment
In addition to the different return indicators, we apply some evaluation measurement for portfolios to assess different aspects of the financial performance of the investment, namely:

Valueatrisk (VaR) measures the worst expected loss over a given interval under normal market conditions at a given confidence level (the lower the better).

Conditional VaR (CVaR) or Expected Short Fall (ETL) is the expected loss tail VaR and tail loss that takes the shape of the tail (the lower the better) into account.

Maximum drawdown (\(D_{max} or MDD\)): Percent the greatest fall from peak to valley on the return series (the lower the better). Drawdowns are measured as a percentage of that maximum cumulative return.

Annualized sharpe ratio (\(SR_{ann}\)) is a rewardtovariability ratio already defined by Eq. 4 but for benchmark between strategies taken \(\sigma _P\) as the standard deviation of the annualized series (the higher the better).

Calmar ratio (CAL) is the annualized return over the absolute value of the maximum drawdown of an investment. It is a Sharpetype measure that uses maximum drawdown rather than standard deviation to reflect the investor’s risk:
$$\begin{aligned} CR = \frac{r_p  r_T}{D_{max}} \end{aligned}$$(19)where \(r_T\) is the minimum target return that we consider equal to zero.

Omega ratio (OME) is a weighted riskreturn ratio for a given level of expected return set to zero in our case, which helps us to identify the chances of winning in comparison to losing (the higher the better):
$$\begin{aligned} \Omega = \frac{\frac{1}{n}\sum _{i=1}^{i=n}max(r_ir_T,0)}{\frac{1}{n}\sum _{i=1}^{i=n}max(r_Tr_i,0)}. \end{aligned}$$(20)
Results
We analyze the performance of the different strategies based on some descriptive statistics and from a financial perspective applied to different cryptocurrency spaces, the full market with 534 cryptocurrencies, and the top 250 and 175 ones according to market capitalization. Additionally, we apply different visual representations to highlight some of our findings. We complement our analysis with an exhaustive study of the outcomes of MC simulations.
Performance of monthly portfolios
In this section, we compare the investment strategies and benchmark methods by aggregating the everyday results of all the investments at day \(t_w\) during the simulation window for each method or strategy. The descriptive analysis is a standard way to illustrate the performance of asset allocation models where each method is compared with the other. In Table 3, we compute the basic descriptive statistics of the portfolio returns (\(Rp_t\)) obtained every day for each strategy and method, brokendown by market size. Figures 2, 3, and 4 represent the portfolio cumulative return (\(r_A\)). In the same way, Table 4 presents the more important ratios for each model, also brokendown by market sizes. From these, we can draw the main conclusions:

Considering the Mean Return values in Table 3, the SR (0.315, 0.300, and 0.931 values) and MR (0.304, 0.339, and 1.296) strategies outperform the others in terms of average values independently of the considered market size. However, as stated in the financial literature, median return value is considered a more representative indicator of the performance of the model when there is a heavytail effect on the return distribution. For median values, SR and MR perform slightly worse than RPP for the smaller market size (0.295).

There are differences in the result of the benchmark approaches (MV, HRP, and RPP) depending on the market size. The impact on the performance of the models indicates that the advantage of the clustered MV model increases as the market size becomes larger. This supports our hypothesis that covariance misspecification is more likely as market size increases.

In terms of the financial ratios in Table 4, we appreciate a similar result. As we increase the market size, SR and MR strategies more clearly outperform any of the benchmarks in terms of CumRet, AnnRet, and AnnSR, where SR strategy outperforms the others in terms of Sharpe Ratio measure (4.089), for instance.

The HR strategy exhibits no investments (zero values on the HR columns on Tables 3 for the lower market sizes (175 and 250) and a flat red line in Figs. 2 and 3). It means that the HR strategy for the cluster prototype allocation has not found a suitable cluster in the highest risk quartiles of the market or the cardinality of the cluster is lower than 10, meaning that any cluster with fewer than 10 cryptos is not considered by any strategy). HR centroids are only selected when we consider the whole market size.

Examining the figures of the Cumulative Returns, the MV strategy outperforms the others in the first half of the investing period for market size 175 as seen in Fig. 2 and only at the very beginning for market size 250 as shown in Fig. 3. However, in these cases, the superiority of the MV is due to one or two very profitable periods that occur on consecutive investment days. However, the MR and SR strategies have more sustained slopes, which suggests better financial behavior. In addition, the MR and SR strategies are better when considering the whole market in Fig. 4.

In general, the MV strategy outperforms the others in the risk indicators (MDD, ETL, and VaR) for all market sizes. However, SR and MV present a better tradeoff between risk and returns as we can see in the combined indicators (AnnSR, CAL, and OME) for the market size 250 and the whole market.

The flat lines on the Cumulative Returns curves for MV, RPP, and HRP models for the whole market highlight the case that the allocation models are not working properly when the market size is large. The optimization algorithms probably do not converge to a feasible solution due to covariance misspecification, which causes a zero portfolio return for that holding period.
Financial comparison of the simulations
We now compare the aggregated results of the random investment paths for all the methods considered and the benchmarks in the three market sizes (175, 250, and the whole market).
In Figs. 5, 6, and 7, we present the aggregation graphically by means of the Cumulative Distribution Function of the Annualized Sharpe Ratio. In these figures, we present the distributions of the 1500 values that correspond to the realization of the simulation.
For the smaller market size (175 cryptos), Table 5 and Fig. 5 reveal that the RPP method outperforms the rest in terms of annualized return and annualized Sharpe ratio, according to the central tendency measures followed by SR and MR strategies.
In terms of drawdown, the classical MV obtains the lowest median value with 0.043, followed by RPP value in Table 6. However, the other methods obtain similar drawdown values between 0.219 and 0.301.
In terms of ETL, MV again clearly outperforms the others with 0.039 ETL value in Table 7. LR and IdX are the riskier models (0.274 and 0.270) with RPP again performing better than the others but worse than the MV model.
Regarding the Sharpe ratio, a measure that balances risk and returns, RPP clearly outperforms the other models with a Sharpe ratio value of 2.238 in Table 8. MR and SR models keep a good tradeoff between risk and returns with values of 1.637 and 1.601, respectively, which is better than the others except for RPP.
When the market size is increased from 175 to 250 cryptos, we observe a clear impact on the Annualized Return performance of MR that is improved up to 3.815 (median)), while that of the MV is reduced up to 0.826 (see in Table 5). For the 250 market size, MR outperform the other models, followed by SR and the RPP. We observe no relevant differences in terms of Drawdown when the market size increases from 175 to 250. Similarly, we find no relevant differences for ETL when comparing 175 and 250 market sizes (see Table 7). However, for the Sharpe ratio indicator, MV and MR models clearly improve with values 1.449 of and 1.847, respectively, in Table 8 and Fig. 6 when the market size is greater. The RPP method outperforms the others with a value of 2.062 followed by MR according to the statistics presented.
If we investigate the financial performance for the whole market size, we can see that the annualized returns of the MR and SR strategies increase dramatically, while that of the benchmarking approaches is slightly reduced. The proposed clustered methods seem to make profit from those cryptocurrencies with smaller market capitalization, while the benchmarks seem to suffer in correctly computing the estimators when the market size increases. For Annualized Returns, the MR outperforms the other models reaching 88.360 value in Table 5, followed by SR with 40.747. In terms of Drawdown, MV outperforms the others with the lowest value followed by MR, SR and HR. At this point and referring to Drawdown, we have to take the fact that for many iterations of MV and RPP, models are not able to converge on the due training window into account; the outcome of the model is zero and that explains the medium value zero. The worst performance for the Drawdown corresponds to IdX, HRP, and LR with 0.281, 0.207, and 0.172 of median values, respectively. The riskier model according to ETL is the IdX benchmark and the riskless corresponds to MV model with a value of 0.0 in Table 7. In terms of Sharpe ratio, the SR strategy again outperforms the others with a median value of 2.469 in Table 8 followed by MR with a value of 2.078. The worst Sharpe ratio performance corresponds to LR and RPP with values of 0.680 and 0.979, respectively.
In general, independently of the method or strategy, we observe a change in the performance ratios when we compare 175/250 market sizes with the whole market in Tables 5, 6, 7, and 8. The portfolio models behave depending on the market size and we see that the standard methods misbehave for the higher market size where we find room for our proposal. If we exclusively take simulation results for \(SR_{ann}\) for the whole market represented in Fig. 7, we find that the SR strategy outperforms the others followed by MR.
In terms of the centroid strategies, we observe that MR and SR strategies overperform compared to the others (LR, HR). Thinking exclusively in terms of returns, MR overperforms compared to SR for similar Max. Drawdown and CVaR. However, in terms of the Annualized Sharpe Ratio, the SR strategy (median value 2.469) beats all the other strategies and models.
For the MR strategy, via the simulations, we confirm the same results as for the monthly portfolios, that is, a lack of centroids for lower market sizes so no simulations either. In general HR when centroids exist perform better than LR.
Conclusion
We propose a methodology that, combined with the assumptions required for a good performance of MV models, allows us to extend the use of portfolio optimization models to the cryptocurrency market regardless of its size and volatility. Its usefulness extends to any portfolio management model that requires a covariancebased measure of market risk, and it is particularly suitable for managing streaming market data flows.
Our methodology proposes a clustering stage to reduce the problem of the dimensionality. It reveals how the performance of the model can be improved by reducing the number of assets considered and focusing on those that best fit the investor criteria. The results are similar to those obtained by an appropriate feature selection in prediction problems (Kou et al. 2021). Clustering reduces the space where the optimization models work, creating more accurate estimations of the different factors and mitigating possible errors. This methodology can be applied to other financial markets and other portfolio optimization models. It is specially indicated when both a large number of assets and a long time window are considered.
Based on the results, we draw the following conclusions:

First, we present a smart way to use the prototypes from clusters to automatize the selection of the more suitable partitioning of the market. The proposed methodology works dynamically with streaming price data of cryptos that, in our case, change on a daily basis although it can be easily adapted to other data periodicities. In this way, market partitions and portfolio generations work in concurrent mode autonomously once we set the criteria for cluster preselection based on the riskaversion profiles of the investors.

The range of performance values when applied to cryptocurrency portfolios exceeds any comparison with traditional markets and this becomes more evident as the size of the market itself grows. Cumulative returns, risk, and drawdowns are significantly higher in cryptocurrency markets. For example, the steep upward trend of the cumulative yield curves means it is not comparable to any growth in traditional financial markets.

We demonstrate that the performance of the standard model MeanVariance (MV) applied to the whole market with no partitions is not very different in magnitude to other results derived by other research (Petukhina et al. 2021; Liu 2019; Culjak et al. 2022) which make us confident regarding the results.
In general,

Sharpe ratio (Strategy 1) and MeanRisk (Strategy 2b) outperform all the other strategies in terms of Cumulative, Annualized Returns and Annualized Sharpe Ratio as the market size increases. At this point, we have to discard HighRisk (Strategy 2c) for standard investors as most of the time, there are no centroids available on the higher rank of volatility, which means this strategy has low suitability for the investor that frequently needs to take a position on the market.

We observe that strategies based on extreme values centroids (LR and HR) underperform the others along the outofsample window. In other words, centroids allocated into the interquartile distribution for estimation windows behave much better along the holding periods. The explanation is aligned with the 2nd and 3rd MV assumptions , which means that in general, riskbased models will perform better when we choose cryptos allocated in the mean region closer to the center of a normal distribution.

One of the drawbacks of the proposed strategies is the higher drawdown and risk compared with the classical MV, which is the more evident weakness of the proposed strategies, so it is comparable to the HRP model. We consider that SR and MR strategies independent of market size offer a good tradeoff between returns, risk, and drawdown.
As we demonstrate, the results are sensitive to the number of cryptocurrencies considered. The smaller the space of cryptocurrencies with more restrictive thresholds during data preprocessing, the higher the chances to exclude the cryptocurrencies with explosive behavior, and the financial performance indicators that are obtained in these cases are more similar to those of traditional markets.
Finally, we have to consider that not all 534 cryptocurrencies considered in this research can be directly traded on the market. Depending on the selected exchange, some can be traded and for others, we should find on other exchanges; for instance, we can trade up to 124 crypto assets (we use the terms crypto asset and cryptocurrency interchangeably) in Coinbase^{Footnote 6} or 380 crypto assets in Binance^{Footnote 7} In other cases, some cryptocurrencies could require some days to reach a consensus^{Footnote 8} before incorporating to the portfolio, which introduces additional frictions on the market not considered in our model. Liquidity pools bring a solution to such frictions in creating a decentralized finance (DeFi) facilitating the turning of assets into cash and vice versa by application of smart contracts. The counterpart is an increase in the transaction complexity and associated risk together with the high turnover of investors providing that liquidity.^{Footnote 9} In addition, there are different liquidity approaches depending, for instance, on whether there are centralized or decentralized exchanges with different spread mechanisms that impact the performance of the trading models. Further work could consider adding a liquidity criterion to the Prototype Selection Strategy stage in Fig. 1. For instance, this could be based on a spread measure together with the risk criteria that we have already used in the cryptocurrency preselection stage to improve the performance of the MeanVariance portfolios considering the conditions of crypto markets in reallife.
Data availibility
The datasets generated and/or analysed during the current study are available in the OSF repository, https://osf.io/mrkug/?view_only=32982841eca5476bb3e45ed0dc215f70
Notes
Abbreviations
 AFCM:

Autocorrelationbased fuzzy Cmeans
 CAGR:

Compound annual growth rate
 CAL:

Calmar ratio
 CLARA:

Clustering large applications
 CNN:

Convolutional neural network
 CVaR:

Conditioned vale at risk
 CVI:

Cluster Validity Index
 DBHT:

Directed bubble hierarchical tree
 ED:

Euclidean distance
 EMH:

Efficient market hypothesis
 ERC:

Equallyweighted risk contribution
 ETL:

Expected short fall
 EW:

Equallyweighted
 GARCH:

Generalized autoregressive conditional heteroskedasticity
 HCAA:

Hierarchical clustering based asset allocation
 HR:

High risk strategy, riskseeking investor strategy
 HRP:

Hierarchical risk parity
 LR:

Low risk strategy, riskaverse investor strategy
 MAR:

Moving average reversion
 MDD:

Maximum draw down
 MPT:

Modern portfolio theory
 MR:

Mean risk strategy, riskneutral investor strategy
 MST:

Minimum spanning tree
 MV:

Meanvariance
 MVO:

Meanvariance optimization
 OLPS:

Online portfolio selection model
 OLMAR:

Online portfolio selection model average reversion
 PAM:

Partition around medoids
 PMFG:

Planar maximally filtered graph
 QP:

Quadratic program
 RPP:

Risk parity portfolio
 RIP:

Random investment path
 SOM:

Selforganizing maps
 SR:

Sharpe ratio or sharpe ratio strategy
 VaR:

Value at risk
References
Alessandretti L, ElBahrawy A, Aiello LM, Baronchelli A (2018) Anticipating cryptocurrency prices using machine learning. Complexity
Apopo N, Phiri A (2021) On the (in) efficiency of cryptocurrencies: Have they taken daily or weekly random walks? Heliyon 7(4):e06685
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256
Bonanno G, Lillo F, Mantegna RN (2001) Highfrequency crosscorrelation in a set of stocks
Bonanno G, Caldarelli G, Lillo F (2004) Networks of equities in financial markets. Eur Phys J B Condens Matter 38(2):363–371. https://doi.org/10.1140/epjb/e2004001296
Borysov P, Hannig J, Marron J (2014) Asymptotics of hierarchical clustering for growing dimension. J Multivar Anal 124:465–479
Breiman L (1960) Investment policies for expanding businesses optimal in a longrun sense. Naval Res Logist Q 7(4):647–651
Brida J, Risso W (2009) Dynamics and structure of the 30 largest north American companies. Soc Comput Econ 35(1):85–99
Burggraf T (2019) Riskbased portfolio optimization in the cryptocurrency world. Inf Syst Econ J. https://ssrn.com/abstract=3454764
Cardoso P (2021) riskParityPortfolio: design of risk parity portfolios. https://CRAN.Rproject.org/package=riskParityPortfolio, r package version 0.2.2
Cerqueti R, Giacalone M, Mattera R (2021) Modelbased fuzzy time series clustering of conditional higher moments. Int J Approx Reason 134:34–52. https://doi.org/10.1016/j.ijar.2021.03.011
Cerqueti R, D’Urso P, De Giovanni L, Giacalone M, Mattera R (2022) Weighted scoredriven fuzzy clustering of time series with a financial application. Expert Syst Appl 198(116):752. https://doi.org/10.1016/j.eswa.2022.116752
Chuen DLK, Guo LM, Wang Y (2017) Cryptocurrency: a new investment opportunity?
Cornuejols G, Tütüncü R (2006) Optimization methods in finance, vol 5. Cambridge University Press
Culjak M, Tomić B, Žiković S (2022) Benefits of sectoral cryptocurrency portfolio optimization. Res Int Bus Financ 60(101):615. https://doi.org/10.1016/j.ribaf.2022.101615
Dai M, Xu ZQ, Zhou XY (2010) Continuoustime Markowitz’s model with transaction costs. SIAM J Financ Math 1(1):96–125. https://doi.org/10.1137/080742889
Datta T, Ghosh I (2015) Using clustering method to understand Indian stock market volatility. Commun Appl Electron 2(6):35–44. https://doi.org/10.5120/cae2015651793
De Prado ML (2016) Building diversified portfolios that outperform out of sample. J Portf Manag 42(4):59–69
Duarte FG, De Castro LN (2020) A framework to perform asset allocation based on partitional clustering. IEEE Access 8(110775–110):88. https://doi.org/10.1109/ACCESS.2020.3001944
D’Urso P, Cappelli C, Di Lallo D, Massari R (2013) Clustering of financial time series. Physica A Stat Mech Appl 392(9):2114–2129
D’Urso P, De Giovanni L, Massari R (2016) Garchbased robust clustering of time series. Fuzzy Sets Syst 305:1–28. https://doi.org/10.1016/j.fss.2016.01.010
D’Urso P, Giovanni LD, Massari R, D’Ecclesia RL, Maharaj EA (2020) Cepstralbased clustering of financial time series. Expert Syst Appl 161(113):705. https://doi.org/10.1016/j.eswa.2020.113705
Eisl A, Gasser SM, Weinmayer K (2015) Caveat emptor: Does bitcoin improve portfolio diversification? Available at SSRN 2408997
Fama EF (1965) The behavior of stockmarket prices. J Bus 38(1):34–105
Fang F, Chung W, Ventre C, Basios M, Kanthan L, Li L, Wu F (2021) Ascertaining price formation in cryptocurrency markets with machine learning. Eur J Finance. https://doi.org/10.1080/1351847X.2021.1908390
Fang F, Ventre C, Basios M, Kanthan L, MartinezRego D, Wu F, Li L (2022) Cryptocurrency trading: a comprehensive survey. Financ Innov 8(1):1–59
Feng Y, Palomar DP (2015) Scrip: successive convex optimization methods for risk parity portfolio design. IEEE Trans Signal Process 63(19):5285–5300. https://doi.org/10.1109/TSP.2015.2452219
Finkelstein M, Whitley R (1981) Optimal strategies for repeated games. Adv Appl Probab 13(2):415–428
Gatheral J (2008) Random matrix theory and covariance estimation. New York
Gilli M, Maringer D, Schumann E (2019) Numerical methods and optimization in finance. Academic Press, New York
Goetzmann WN, Brown SJ, Gruber MJ, Elton EJ (2014) Modern portfolio theory and investment analysis. Wiley, New York
Goldfarb D, Idnani AU (1983) A numerically stable dual method for solving strictly convex quadratic programs. Math Program 27:1–33
Gubu L, Rosadi D, Abdurakhman YB (2020) Robust mean variance portfolio selection using cluster analysis: a comparison between Kamila and weighted Kmean clustering. Asian Econ Financ Rev 10(10):1169–1186
Guris B (2021) NonlinearTSA: nonlinear time series analysis. R package version 3.5.0
Henning C, Meila M, Murtagh F, Rocci R (2016) Handbook of cluster analysis. CRC Press, Boca Raton
Iorio C, Frasso G, D’Ambrosio A, Siciliano R (2018) A pspline based clustering approach for portfolio selection. Expert Syst Appl 95:88–103. https://doi.org/10.1016/j.eswa.2017.11.031
Jiang Z, Liang J (2017) Cryptocurrency portfolio management with deep reinforcement learning. In: 2017 Intelligent systems conference (IntelliSys), pp 905–913. https://doi.org/10.1109/IntelliSys.2017.8324237
Kalayci CB, Ertenlice O, Akbay MA (2019) A comprehensive review of deterministic models and applications for meanvariance portfolio optimization. Expert Syst Appl 125:345–368. https://doi.org/10.1016/j.eswa.2019.02.011
Kapetanios G, Shin Y, Snell A (2003) Testing for a unit root in the nonlinear star framework. J Econom 112(2):359–379
Kaufman L (1986) Clustering large data sets. Pattern Recognit Practice. https://doi.org/10.1016/B9780444878779.50039X
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. https://doi.org/10.2307/2532178
Kelly J (1956) A new interpretation of information rate. Bell Syst Tech J
Khedmati M, Azin P (2020) An online portfolio selection algorithm using clustering approaches and considering transaction costs. Expert Syst Appl 159(113):546. https://doi.org/10.1016/j.eswa.2020.113546
Kolm PN, Tütüncü R, Fabozzi FJ (2014) 60 Years of portfolio optimization: practical challenges and current trends. Eur J Oper Res 234(2):356–371. https://doi.org/10.1016/j.ejor.2013.10.06
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inf Sci 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137
Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, Kou S (2021) Bankruptcy prediction for SMES using transactional data and twostage multiobjective feature selection. Decis Support Syst 140(113):429. https://doi.org/10.1016/j.dss.2020.113429
Kyriazis NA (2019) A survey on efficiency and profitable trading opportunities in cryptocurrency markets. J Risk Financ Manag. https://doi.org/10.3390/jrfm12020067
Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Financ 10:603–621
Ledoit O, Wolf M (2004) Honey, i shrunk the sample covariance matrix. J Portf Manag 30(4):110–119
Leland HE (1999) Beyond meanvariance: performance measurement in a nonsymmetrical world. Financ Anal J 55(1):27–36
Li B, Hoi SCH (2012) Online portfolio selection with moving average reversion. In: Proceedings of the 29th international conference on international conference on machine learning, Omnipress, Madison, WI, USA, ICML’12, pp 563–570
Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv. https://doi.org/10.1145/2512962
Li B, Hoi SC, Sahoo D, Liu ZY (2015) Moving average reversion strategy for online portfolio selection. Artif Intell 222:104–123
Li B, Sahoo D, Hoi SC (2016) Olps: a toolbox for online portfolio selection. J Mach Learn Res 17(1):1242–1246
Li D, Ng WL (2000) Optimal dynamic portfolio selection: multiperiod meanvariance formulation. Math Financ 10(3):387–406
Li T, Kou G, Peng Y, Yu PS (2021) An integrated cluster detection, optimization, and interpretation approach for financial data. IEEE Trans Cyberne. https://doi.org/10.1109/TCYB.2021.3109066
Lintner J (1965) The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev Econom Stat 47(1):13–37
Liu W (2019) Portfolio diversification across cryptocurrencies. Financ Res Lett 29:200–205
Livieris IE, Pintelas E, Stavroyiannis S, Pintelas P (2020) Ensemble deep learning models for forecasting cryptocurrency timeseries. Algorithms. https://doi.org/10.3390/a13050121
Lohre H, Rother C, Schäfer KA (2020) Hierarchical risk parity: accounting for tail dependencies in multiasset multifactor allocations. New developments and financial applications, machine learning for asset management, pp 329–368
Lorenzo L, Arroyo J (2022) Analysis of the cryptocurrency market using different prototypebased clustering techniques. Financ Innov. https://doi.org/10.1186/s40854021003109
Luca GD, Zuccolotto P (2017) Dynamic tail dependence clustering of financial time series. Stat Pap 58(3):641–657
Maillard S, Roncalli T, Teïletche J (2010) The properties of equally weighted risk contribution portfolios. J Portf Manag 36(4):60–70
Makarov I, Schoar A (2020) Trading and arbitrage in cryptocurrency markets. J Financ Econ 135(2):293–319
Mantegna R (1999) Hierarchical structure in financial markets. Eur Phys J B 11(1):193–197
Marcus G (2018) Deep learning: a critical appraisal. CoRR abs/1801.00631. arXiv:1801.00631
Markowitz H (1952a) Portfolio selection*. J Financ 7(1):77–91. https://doi.org/10.1111/j.15406261.1952.tb01525.x
Markowitz H (1952b) The utility of wealth. J Polit Econ 60. https://EconPapers.repec.org/RePEc:ucp:jpolec:v:60:y:1952:p:151
Markowitz HM (1959) Portfolio selection. Yale University Press, Yale
Marti G, Nielsen F, Bi’nkowski M, Donnat P (2017) A review of two decades of correlations, hierarchies, networks and clustering in financial markets. Papers 1703.00485. arXiv:1703.00485
Mattera R, Giacalone M, Gibert K (2021) Distributionbased entropy weighting clustering of skewed and heavy tailed time series. Symmetry 13(6):959
Michaud RO (1989) The Markowitz optimization enigma: Is ‘optimized’ optimal? Financ Anal J 45(1):31–42. https://doi.org/10.2469/faj.v45.n1.31
Molyboga M (2020) A modified hierarchical risk parity framework for portfolio management. J Financ Data Sci 2(3):128–139
Musmeci N, Aste T, di Matteo T (2014) Clustering and hierarchy of financial markets data: advantages of the DBHT. arXiv:qfin.ST/1406.0496v1
Nanda S, Mahanty B, Tiwari M (2010) Clustering Indian stock market data for portfolio management. Expert Syst Appl 37:8793–8798. https://doi.org/10.1016/j.eswa.2010.06.026
Newman M (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46(5):323–351. https://doi.org/10.1080/00107510500052444
Nguyen Cong L, Wisitpongphan N, Meesad P, Unger H (2014) Clustering stock data for multiobjective portfolio optimization. Int J Comput Intell Appl. https://doi.org/10.1142/S1469026814500114
Onnela JP, Chakraborti A, Kaski K, Kertész J, Kanto A (2003) Dynamics of market correlations: taxonomy and portfolio analysis. Phys Rev E. https://doi.org/10.1103/physreve.68.056110
Otranto E (2008) Clustering heteroskedastic time series by modelbased procedures. Comput Stat Data Anal 52(10):4685–4698. https://doi.org/10.1016/j.csda.2008.03.020
Pafka S, Kondor I (2003) Noisy covariance matrices and portfolio optimization ii. Physica A Stat Mech Appl 319:487–494
Palamalai S, Kumar KK, Maity B (2021) Testing the random walk hypothesis for leading cryptocurrencies. Borsa Istanbul Rev 21(3):256–268
Petukhina A, Trimborn S, Härdle WK, Elendner H (2021) Investing with cryptocurrencies: evaluating their potential for portfolio allocation strategies. Quant Finance 21(11):1825–1853. https://doi.org/10.1080/14697688.2021.1880023
Piccolo D (1990) A distance measure for classifying Arima models. J Time Ser Anal 11(2):153–164
de Prado L (2016) Building diversified portfolios that outperform out of sample. J Portf Manag 42(4):59–69. https://doi.org/10.3905/jpm.2016.42.4.059
Qian EE (2016) Risk parity fundamentals. CRC Press, Boca Raton
Raffinot T (2017) Hierarchical clusteringbased asset allocation. J Portf Manag 44(2):89–99
Raffinot T (2018) The hierarchical equal risk contribution portfolio. Available at SSRN 3237540
Roncalli T (2013) Introduction to risk parity and budgeting. CRC Press, Boca Raton
Samuelson PA (1965) Proof that properly anticipated prices fluctuate randomly. Manag Rev 6(2)
Schubert E, Rousseeuw PJ (2019) Faster kmedoids clustering: improving the pam, CLARA, and CLARANS algorithms. In: Amato G, Gennaro C, Oria V, Radovanović M (eds) Similarity search and applications. Springer International Publishing, Cham, pp 171–187
Sebastião H, Godinho P (2021) Forecasting and trading cryptocurrencies with machine learning under changing market conditions. Financ Innov 7(1):1–30
Sharpe W (1964) Capital asset prices: a theory of market equilibrium under conditions of risk. J Finance 19(3):425–442
Soleymani F, Vasighi M (2020) Efficient portfolio construction by means of CVAR and kmeans++ clustering analysis: evidence from the NYSE. Int J Finance Econ. https://doi.org/10.1002/ijfe.2344
Song JY, Chang W, Song JW (2019) Cluster analysis on the structure of the cryptocurrency market via bitcoinethereum filtering. Physica A 527(121):339
Song WM, di Matteo T, Aste T (2012) Hierarchical information clustering by means of topologically embedded graphs. PLoS ONE
Steinbach MC (2001) Markowitz revisited: meanvariance models in financial portfolio analysis. SIAM Rev 43(1):31–85
Stosic D, Stosic D, Ludermir TB, Stosic T (2018) Collective behavior of cryptocurrency price changes. Physica A 507:499–509
Thorp EO (1975) Portfolio choice and the Kelly criterion. In: Stochastic optimization models in finance, Elsevier, pp 599–619
Tumminello M, Di Matteo T, Aste T, Mantegna RN (2006) Correlation based networks of equity returns sampled at different time horizons. Eur Phys J B 55(2):209–217. https://doi.org/10.1140/epjb/e2006004144
Umino K, Kikuchi T, Kunigami M, Yamada T, Terano T (2022) Empirical analyses of Olmar method for financial portfolio selection in stock markets. J Adv Comput Intell Inform 26(4):451–460
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. https://www.stats.ox.ac.uk/pub/MASS4/. ISBN 0387954570
Wang M, Li C, Xue H, Xu F (2014) A new portfolio rebalancing model with transaction costs. J Appl Math
Watorek M, Drozdz S, Kwapien J, Minati L, Oswiecimka P, Stanuszek M (2020) Multiscale characteristics of the emerging global cryptocurrency market. Phys Rep. https://doi.org/10.1016/j.physrep.2020.10.005
Xu M, Chen X, Kou G (2019) A systematic review of blockchain. Financ Innov. https://doi.org/10.1186/s408540190147z
Zhao JL, Fan S, Yan J (2016) Overview of business innovations and research opportunities in blockchain and introduction to the special issue. Financ Innov 2(1):1–7. https://doi.org/10.1186/s4085401600492
Zhou L, Zhang L, Zhao Y, Zheng R, Song K (2021) A scientometric review of blockchain research. IseB 19(3):757–787
Acknowledgements
We are very grateful for the selfless efforts of the huge community of R developers https://www.rproject.org/ on which we relied to develop our research.
Funding
This work was supported by the European Union’s H2020 Coordination and Support Actions CA19130 under Grant Agreement Period 2.
Author information
Authors and Affiliations
Contributions
The initial idea was conceived by LL. The experiments were designed by both authors. The searching on the data bases, statistical analysis and software design as performed by LL. The work was drafted by LL and revised critically by JA. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lorenzo, L., Arroyo, J. Online riskbased portfolio allocation on subsets of crypto assets applying a prototypebased clustering algorithm. Financ Innov 9, 25 (2023). https://doi.org/10.1186/s40854022004382
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40854022004382
Keywords
 Fintech
 Meanvariance
 Cryptocurrency
 Electronic market
 Portfolio allocation model
 Clustering