Forecasting cryptocurrency returns and volume using search engines

In the context of the debate on the role of cryptocurrencies in the economy as well as their dynamics and forecasting, this brief study analyzes the predictability of Bitcoin volume and returns using Google search values. We employed a rich set of established empirical approaches, including a VAR framework, a copulas approach, and non-parametric drawings, to capture a dependence structure. Using a weekly dataset from 2013 to 2017, our key results suggest that the frequency of Google searches leads to positive returns and a surge in Bitcoin trading volume. Shocks to search values have a positive effect, which persisted for at least a week. Our findings contribute to the debate on cryptocurrencies/Bitcoins and have profound implications in terms of understanding their dynamics, which are of special interest to investors and economic policymakers.


Introduction
It is difficult to make a prediction, particularly about the future!yet this difficulty has not deterred the practice of forecasting.Predictions of future technological changes and their implications for the socio-economic and financial outlook are areas of research that have never lost their glitter.In the same vein, forecasting the dynamics of technology and its implications for financial asset prices and their returns have always been one of the most interesting aspects of research.In the twenty-first century, the perpetual evolutionary characteristics of financial and technological innovation have brought us to the age of cryptocurrencies, one of which is Bitcoin.Crypto or digital currency is an asset that only exists electronically.The most popular cryptocurrencies, such as Bitcoin, were designed for transactional purposes; however, they are often held for speculation in anticipation of a rise in their values (see Bank of England (2018) for detailed insight into digital currencies).Based on blockchain technology, Bitcoin is the most popular and used cryptocurrency, and in some cases, has been treated in tandem with conventional currencies (see Kristoufek and Vosvrda, 2016).Bitcoin came with controversy and there are doubts about its future, yet the popularity of cryptocurrencies has been increasing since their inception (Li and Wang, 2017).
One aspect of this controversy is the debate on whether Bitcoin should be considered a safe financial asset.A few recent studies have debated about the Bitcoin market and its dynamics; for example, Li and Wang (2017) argued that despite the intense discussion, our understanding regarding the values of cryptocurrencies is very limited.Some of the participants in this debate have appreciated the role of cryptocurrencies; for instance, Kim (2017) argued that the simpler infrastructure and lower transaction cost of Bitcoin are advantages compared to retail foreign exchange markets.Similarly, Bouri et al. (2017) found that the Bitcoin acts as a hedge against uncertainty, while Dyhrberg (2016, 2016b) declared it a good hedge against stocks, the US dollar, and gold, and argued that it can be included in the variety of tools available to market analysts to hedge market specific risk1.Financial innovation has been an important platform for the debate and implications of blockchain technology and cryptocurrencies (for instance, see the special issue on blockchain)2.
The emergence of cryptocurrencies has important implications for the global economy in general and emerging economies in particular.For instance, a study by Carrick (2016) argued that Bitcoin and cryptocurrencies have idiosyncratic features that make them suitable and complementary to the currencies of emerging markets.Furthermore, the risk to Bitcoin technologies can also be minimized and concomitantly, cryptocurrencies have an important role to play in emerging economies.Similarly, on the importance of Bitcoin, Polasik et al. (2015) highlighted the importance of Bitcoin for eCommerce and argued that it has the potential to play a significant role.A study by Pazaitis et al. (2017) argued that the bitcoin (blockchain) technology has the potential to enable a new system of value that will better support the dynamics of social sharing.Similarly, from the technological as well economic perspective, Goertzel et al. (2017) argued that blockchain technologies are useful in terms of transparency, humanizing global economic interaction, emotional resonance, and maximization of economic gain.Contrarily, some contemporary studies, for instance, Corbet et al. (2017), investigated the fundamental drivers of cryptocurrency (Bitcoin) price behavior and reported that there are clear periods of bubble behavior; furthermore, as it stands, Bitcoin is in the bubble phase.Similarly, Jiang (2017) reported the existence of long-term memory and inefficiency in the Bitcoin market.Alvarez-Ramirez et al. (2018) analyzed the long-range correlation and informational efficiency of the Bitcoin market.They reported that the Bitcoin market exhibits periods of efficiency alternating with periods where the price dynamics are driven by anti-persistence.However, Bariviera et al. (2017), compared the dynamics of Bitcoin and standard currencies and focused on the analysis of returns using different time scales.They found that Hurst exponents changed significantly during the first years of Bitcoin's existence, tending to stabilize in recent times.A later study by Bouri et al. (2018) reported that the global financial stress index could be useful for predicting Bitcoin returns.Nonetheless, in the debate (or controversy) around cryptocurrencies, important factors that have been fairly underappreciated are their determinants and predictability.On this aspect, a study by Feng et al. (2017) reported evidence of informed trading in the Bitcoin market prior to large events, which led them to argue that informed trading could be helpful in explaining Bitcoin behavior; however, this area requires further exploration, which is the objective of the current study.
In recent years, some studies have analyzed the ability of keyword analysis to forecast technological factors.For instance, a study by Dotsika and Watkins (2017) used keyword network analysis to identify the potentially disruptive trends in emerging technologies3 and reported significant influence.Similarly, Dubey et al. (2017) showed that big data and predictive analytics could influence social and environmental sustainability.Some studies have tested the effects of data availability on the internet and in print-media on financial asset returns.For instance, in equity markets, Tetlock (2007) analyzed the role of traditional media, whereas Bollen et al. (2011) used Twitter to forecast equity markets.Similarly, Moat et al. (2013) used Wikipedia as a predictive tool, while Challet and Ayed (2013) showed the importance of keywords in Google for predicting financial market behavior.A study by Preis et al. (2013) analyzed trading behavior using Google Trends.
Interestingly, search engines can influence portfolio diversification, as Google Trends are found to be connected with Bitcoin prices; there was also evidence of the asymmetric effect of an increased interest in the currency while it is above or below its trend value (Kristoufek, 2013).Apparently, because of their trading behavior, investors' and market participants' psychologies play an important role in pricing any asset's return.Considering the fact that Bitcoin is claimed to be independent of monetary authority influence (Nakamoto, 2012), transactions will be influenced to a greater extent by the investor's sentiments and the market forces of supply and demand than by governmental intervention.Undoubtedly, this may result in asset bubbles or Minsky movements (see Tavasci and Toporowski, 2010); however, overwhelming information is generated in the process involved in the decision-making that leads to cryptocurrency transactions.This information is very often captured by Google Trends, which records users' search histories and ranks them from 1 to 100.The more frequently internet users conduct a search on a topic, the higher its indicator.A number of studies from social to health sciences have employed these figures4.Specific to the financial world, there is some limited evidence that suggests potential causal linkages; however, it requires further exploration.For instance, Preis et al. (2010) reported that while there is no evidence to define the relationship between search data and stock market returns, interestingly, Google Trends numbers can be used to predict trading volumes (S&P 500).A later study by Preis et al. (2013) also demonstrated that data generated from a search engine is used to explain stock market movements.Furthermore, portfolios constructed based on a high number of searches will outperform the market.Studies by Joseph et al. (2011) and Da et al. (2011) concluded that Google search values will be a good tool for predicting future returns with a lag of two or three weeks.However, specific to Bitcoin, to the best of our knowledge, no study has explored this nexus.Keeping this concise evidence in context, there is a caveat in existing knowledge on the role of search engines and the data generated during their routine functioning process in predicting the dynamics of Bitcoin.Accordingly, this study is an endeavor to analyze the significance of search engines for predicting Bitcoin returns and volume.We employ a rich set of established empirical approaches (including the VAR framework, a copulas approach, and nonparametric drawings for time series to calculate the dependence structure).Using a weekly dataset from 2013 to 2017, our key results suggest that Google search values carry a remarkable amount of information for predicting Bitcoin returns.There was also a positive effect of Google search values on Bitcoin trading volume, although the estimates fell short of statistical significance.Our findings contribute to the recent literature and debate on cryptocurrencies, their role in developed and emerging economies, and understanding their dynamics as well as their predictability.

Data
The data employed is obtained from Google Trends (for search level values) and Coinmarketcap (for Bitcoin's price and trading volume), starting from the first week of 2014 to the last week of 2017.We eliminated Google search values extracted before 2008 because these figures are unreliable (see Challet and Ayed, 2013, for details).Following Miller's (2013) approach, the logarithmic values of Bitcoin prices are used to calculate Bitcoin returns as shown in Eq. 1: Furthermore, we computed the logarithmic figure in the movement of Google search values and divided by standardization (standard deviation) to make this index compatible with changes in Bitcoin prices, which were already converted to returns (Eq.1).Due to the continuous trading in the cryptocurrencies market, it includes transactions carried out the weekend days.Therefore, we choose to collect the Bitcoins price data on Sunday as it is the last day in the week.Concomitantly this does not require correction for the insufficient data, for instance like stock markets which only open until Friday.Furthermore, Google Trends are completely extracted from the open-source provided by Google.In addition, we adjusted some of the insufficient data collected from Google Trends to have a continuous time series.However, in the Weeks with no data were skipped and returns and volume were adjusted to balance the dataset.The standardized Google search value (SGSV) is estimated as follows: In the subject model, we propose to use log volume to have a de-trended tool for the rolling average of the past 12 weeks of log volume.This approach was popularized by Campbell and Yogo (2006) and is used to construct the volume series, which is also tested for stationarity.

Methodology and findings
To begin, we performed a descriptive statistical analysis to gain insight into the features of the data.The results are presented in Table 1.
After the brief description of data, we employed unit root tests to check if the data series is stationary, using the augmented Dickey-Fuller (ADF) and Phillips-Perron tests.The results presented in Table 2 suggest that the dataset is stationary at levels, i.e.I (o).
The alternative specifications of the unit root tests (inclusion/exclusion of trends and intercepts) unanimously suggested that all variables are stationary, and the null of the unit root was rejected at the 1% confidence level (P-value < 0.01).Next, we tested for co-integration using the Johansen cointegrated test for these pairs of variables.
The results of the co-integration test presented in Table 3 suggest that there is no co-integrating relationship between any two pairs (i.e., SGSV and returns and SGSV and Volume).This suggests that the relationship between Google search values and Bitcoin returns and trading volume do not persist in the long run.This is intuitive, considering the volatility and dynamics of the market.Hence,  this leads us to a VAR estimation.Before proceeding, we selected the lag order based on the Akaike information criteria and chose three as the optimal number of lags6.To determine the direction of causality, we performed a Granger causality test and the results presented in Table 4.
The results of the Granger causality test showed that there is strong evidence of causality for Bitcoin returns only for the SGSV.This was statistically unidirectional causality running from the SGSV only to returns.This means that Bitcoin returns on can be predicted by the Google search value.This is an intuitive finding, as investors looking for Bitcoin information on the Internet may lead to an increase in the price of Bitcoin, producing a cause-and-effect relationship with Bitcoin returns.The causal relationship between the SGSV and volume fell just short of the benchmark level of significance (11%).Next, to take a broader perspective on the association among the variables being analyzed, we performed an impulse response function (IRF) analysis; the results are presented in Figs. 1 and 2.
The IRF analysis showed that Bitcoin returns responded positively to a shock to the SGSV.The response was also statistically significant and the surge in returns persisted for a period before starting to decline.This implies that a shock on the search value leads to an increase in returns immediately over the following week.Afterwards, it sharply decreases and ends in the second week.On the other hand, stock returns did not lead to a surge in searches.
The IRF for volume and the SGSV, presented in Fig. 2, showed that a shock to the SGSV positively influenced Bitcoin trading volume.Moreover, this shock triggered a gradual increase in trading volume over two weeks, and thereafter the effects started to diminish.The remaining pairs of analysis did not show any significant responses, indicating lack of association.Accordingly, we can only infer that one can confidently predict a surge in trading volume in response to a surge in the SGSV.However, the contribution of the SGSV to volume is comparatively trivial.Investors find more information about Bitcoin by searching, but their trading behavior is not explained by the action of searching.This also implies that those who search do not necessarily enter into transactions.

Dependence structure by copulas and nonparametric estimation
We also employed a copulas approach with an estimated parameter to define how the dependency holds between the variables of interest.The rationale for enriching our estimation with this approach is a) manifested in the notion to perform an inclusive empirical analysis, and b) that the assumptions for the previous test are quite strict, whereas copulas meet more requirements for testing dependence structures, including left tailed, right tailed, or normal distributions.The nonparametric approach is a good method for estimating the dependence structure for a pair of random variables, whereas the parametric (copulas) is the best indicator for identifying the position of tail dependence rather than structure (Nguyen et al., 2017).Instead of employing correlation or causality with the disadvantage of scalar measures of dependence or linear estimations, we employ Kendal-plots and copulas to determine the dependence relationship by joining the marginal distribution with the joint distribution of the variables being analyzed.Stock returns, the Google search volume index, and Bitcoin's trading volume are the random variables.Hence, this approach is an appropriate candidate for use as the framework of analysis.
Furthermore, the fluctuation of Bitcoin prices is quite high, depicting substantial nonlinearities; using a traditional approach such as correlation or Granger Causality would be prone to producing spurious results for estimation.For all these reasons we employed copulas and a nonparametric approach.The results are presented in 5: With the highest log likelihood, we choose the Gumbel copulas family for estimation.The results suggested that the Google search value has a strong relationship with returns but a comparatively weaker one with volume.Nonetheless, the results for volume were still significant at the 10% level.In addition, the Gumbel copulas family (right tail) indicates joint probabilities for increasing values for both groups.
Last, Kendall plots were adopted, which is a graphical approach based on rank statistics.The novelty of this approach is that it allows detection of nonlinear dependence between two variables.Kendal plots are an effective methodology for capturing a dependence structure.In their seminal work, Genest and Boies (2003) introduced the Kendall-plot (K-plot) to investigate dependence between random variables."K-plots are easier to interpret than chi-plots because the curvature they display in cases of association is related in a definite way to the copula characterizing the underlying dependence structure."(see Genest and Boies, 2003, page 275).Considering this aspect, we chose Kendall-plots to determine the dependence structure of Bitcoin returns and search engines, as well as trading volume.The results are presented in Fig. 3.
The Kendall-plots showed that the points are not linearly distributed along the 45-degree line of the graph, confirming that these series of values are dependence  structures.Concomitantly, the findings in this section complement those obtained by the traditional tests.

Conclusion and implications
Cryptocurrencies, which are based on blockchain technology and are often called Bitcoin, have recently attracted a lot of debate in socio-economic and financial circles.The behavior of cryptocurrencies and their dynamics, as well as their predictability, are of prime interest to investors and financial institutions, as well as policymakers.Keeping this interest in context, this brief study has analyzed the predictability of Bitcoin volume and returns using data extracted from the Google search engine.We employ a rich set of established empirical approaches, including the VAR framework, a copulas approach, and non-parametric drawings of time series, which are characterized as continuous, and random variables for capturing the dependence structure.Our key findings lead us to conclude that Google search values exert significant influence on Bitcoin returns, particularly in the short run.We also found that Google search values have some influence on the trading volumes of cryptocurrencies, although our results fell just short of statistical significance benchmarks.This study contributes to existing evidence on blockchain technology by providing new empirical evidence that search values (especially Google Trends, which measure the level of finding information about something) can be good predictors for an asset's return, particularly a typical cryptocurrency, Bitcoin.The results indicate that there was no long-run relationship; however, there was clear short-term dependency.The more frequently investors look for information, the higher the returns and trading volume that follow.This shock influence lasts at least one week before returning to equilibrium.By using copulas and a nonparametric approach, we confidently confirm the relationship between search values and Bitcoin returns and volume.Search tools can generate information, which is swiftly incorporated into the market, and can support investment in and predictability of Bitcoin returns and volume.However, in the future, depending on government and monetary authorities' policies around the world in both developed and developing economies, the relationship between Google search volumes and cryptocurrency returns may change, which will require further exploration in this area.The proposed approach and framework we employed in this study for Bitcoins can be extended to other cryptocurrencies and asset classes, including both financial and non-financial assets.
There are also some limitations of this study which provides a rationale for further research in this area.For instance, in the future work, the interactions between Google Trends and cryptocurrencies can be seen through the lens of a time-varying framework such as Time-Varying Copulas.For the future research, fellow scholars might be interested in expanding the analysis to other cryptocurrencies such as Ethereum (ETH) and Litecoin (LTC) etc. lastly, our results are not able to directly point out the relationship between cryptocurrency and return or volume by other behavioural factors such as sentiment, risk-appetite, etc.Hence, in the future research combining one may consider combining these factors.

Fig. 3
Fig. 3 Kendall-plots for Bitcoin's return and volume with Standardized Google Searching Value extracting from R estimation

Table 1
Descriptive Statistics

Table 3
Johansen Co-integration Test

Table 5
Copulas estimation results for two pairs