Bitcoin price change and trend prediction through twitter sentiment and data volume

Twitter sentiment has been shown to be useful in predicting whether Bitcoin’s price will increase or decrease. Yet the state-of-the-art is limited to predicting the price direction and not the magnitude of increase/decrease. In this paper, we seek to build on the state-of-the-art to not only predict the direction yet to also predict the magnitude of increase/decrease. We utilise not only sentiment extracted from tweets, but also the volume of tweets. We present results from experiments exploring the relation between sentiment and future price at different temporal granularities, with the goal of discovering the optimal time interval at which the sentiment expressed becomes a reliable indicator of price change. Two different neural network models are explored and evaluated, one based on recurrent nets and one based on convolutional networks. An additional model is presented to predict the magnitude of change, which is framed as a multi-class classification problem. It is shown that this model yields more reliable predictions when used alongside a price trend prediction model. The main research contribution from this paper is that we demonstrate that not only can price direction prediction be made but the magnitude in price change can be predicted with relative accuracy ( 63%).


Introduction
Bitcoin (Nakamoto 2009) has, since its introduction in 2008/9, attracted extensive attention for various reasons from different stakeholders with varying opinions regarding its utility and potential. Despite the fact that it and its underlying blockchain technology have been declared 'dead' on a number of occations, 1 recent all-time high prices 2 and research indicates otherwise (Ellul 2021). Given that such opinions may and do sway potential investors' interest, it is desirable to investigate whether better methods could be developed to support investors. In fact, not only could public opinion sway potential investors' interest, the same public sentiment could be used to empower investors to make better informed decisions regarding future price predictions.
There is no doubt that sentiment affects an asset's price-and as put by Baker and Wurgler: "the question is no longer, as it was a few decades ago, whether investor sentiment affects stock prices, but rather how to measure investor sentiment and quantify its effects" (Baker and Wurgler 2007). Whilst this statement is referring to a well-founded body of literature on applying sentiment analysis to traditional markets (Gunter et al. 2014;Rao and Srivastava 2012;Li et al. 2014;Mittal and Goel 2012), sentiment analysis can similarly be used for cryptocurrency price prediction as demonstrated extensively in recent work (Valencia et al. 2019;Kraaijeveld and De Smedt 2020;Abraham et al. 2018;Stenqvist and Lönnö 2017;Pant 2018;Galeshchuk et al. 2018;Kilimci 2020;Naeem et al. 2020;Serafini et al. 2020;Wołk 2019;Balfagih and Keselj 2019;Mohapatra et al. 2020).
Twitter 3 is widely-used as a source of sentiment data, and has become a popular social media platform amongst crypto-communities (Kraaijeveld and De Smedt 2020;Abraham et al. 2018;Stenqvist and Lönnö 2017;Pant 2018;Galeshchuk et al. 2018;Kilimci 2020;Naeem et al. 2020;Balfagih and Keselj 2019;Mohapatra et al. 2020). Whilst the current state-of-the-art has achieved encouraging results, yet further research effort is required to overcome a number of issues. Many of these following issues are frequently encountered in price prediction models based on sentiment analysis of Twitter data: (i) evaluation is typically based on minimal historical data (Pant 2018;Valencia et al. 2019;Stenqvist and Lönnö 2017;Kilimci 2020); (ii) predictions tend to lag behind when the predicted prices are actually seen on the market (Serafini et al. 2020; iii) models are frequently limited to the prediction of the direction (up or down) (Kilimci 2020; Valencia et al. 2019;Galeshchuk et al. 2018), though some studies have proposed to make predictions of exact prices with limited success (Pant 2018;Li and Dai 2020;Serafini et al. 2020). Some of the issues described above may be due to the following particular challenges. The direction of price change and magnitude is often non-linear, and therefore not straightforward to solve (Kimoto et al. 1990). Furthermore, tweets are often duplicated for marketing purposes and may also be automated by tweet bots (Valencia et al. 2019). Tweets also typically contain features that result in noise (when it comes to sentiment analysis) including hashtags, profile mentions and URLs (Kraaijeveld and De Smedt 2020). At the same time, the use of sarcasm in tweets could skew sentiment predictions (Rosenthal et al. 2014). Therefore, prior to tackling the general problem of extracting sentiment, pre-processing of tweets should be undertaken to reduce such noise.
In this paper, we investigate predicting the magnitude of price change (beyond just the direction) and to the best of our knowledge, this is the first paper proposed to do so. We present exhaustive evaluation and conclusive results for a number of models. Furthermore, the models proposed overcome the late prediction problem see in the state-of-the-art.
Furthermore, we investigate the predictive relationship between Twitter sentiment and associated price changes as a function of different time lags. Through this we thereby address the question of which temporal interval between expressed sentiment and price change provides the best results.
An in-depth study was undertaken to determine how different types of neural networks and features used may affect accuracy, in which each model investigated was evaluated against different combinations of features used as well as against different time lags introduced between sentiment and price change.
The rest of this paper is organised as follows. The following section gives an overview of sentinent analysis, followed by recent work on Bitcoin and other cryptocurrency price prediction. Then the classification problems addressed, and the methods used for data preprocessing, feature extraction, and the neural models we propose are presented. After which results are presented and discussed. Finally, then conclude with some directions for future work.

Background on sentiment analysis
In its raw form, natural language text is meaningless to a computer-nothing more than encoded bytes. Over the past decade strides have been made within the field of Natural Language Processing (NLP) with the aim of enabling computational systems to reason better about natural language. Sentiment analysis, as its name implies, analyses and extracts sentiment, opinion, subjectivity and polarity from text. Use-cases for sentiment analysis are plenty-including but not limited to product market analysis and automated flagging of positive/negative/potentially-harmful comments on websites and social media platforms.
Since the introduction of Sentiment Analysis to the NLP community (Pang and Lee 2008), several techniques have been proposed to associate polarity with a piece of text. It can be framed as a classification problem, whereby a text segment is classified as positive, negative or neutral. Some approaches also assign a value reflecting the degree of confidence associated with the respective polarity. Lexicon-based approaches make use of a lexicon (a collection of words) and associated sentiment scores to compare with the text being classified to determine a final polarity score. One widely-used lexicon-based implementation, VADER (Valence Aware Dictionary and Sentiment Reasoner) (Hutto and Gilbert 2015), further makes use of rule-matching, which attempts to identify polarity based on the input text using linguistic patterns. VADER combines selected features from three validated lexicons 4 along with tweet intensity rules that were extracted from analysis of terms of syntax, grammar and valence values of 800 tweets (Stenqvist and Lönnö 2017). Sentiment analysis has over the past decade been extensively applied to Twitter data which indeed 'poses newer and different challenges' (Agarwal et al. 2011) beyond those associated with more traditional sentiment analysis applications (Hussein 2018) and the wide range of approaches (Medhat et al. 2014). Furthermore, the SemEval international workshop 5 helped facilitate further research by making available a set of shared challenges for the community. Particularly related to the context of this work, since the 2013 workshop (Nakov et al. 2013) a shared task focused on Twitter sentiment analysis has been published every year.

Related work on cryptocurrency price prediction
We now provide an overview of approaches used in specifically the domain of cyrptocurrency price prediction. Further detail and a comparison with the state-of-the-art will be woven into the methodology and evaluation sections, in order to provide a direct comparison of relevant aspects.
Attempts to predict Bitcoin price using Long Short Term Memory Cells (LSTM), Convolutional Neural Network (CNN) and hybrid CNN-LSTM models without undertaking sentiment analysis were originally proposed (Li and Dai 2020;Livieris et al. 2021;Kwon et al. 2019). Initial work in investigating the use of sentiment analysis of Twitter data for price prediction was proposed by Pant (2018); Galeshchuk et al. (2018) and Serafini et al. (2020). To eliminate the vanishing gradient problem seen in Recurrent Neural Networks (RNNs), Pant (2018) proposed to make use of an RNN predictor with LSTM and Gated Recurrent Unit (GRU) variations. This work resulted in a moderate correlation between rising negative sentiment and consequent falling of prices. A comparison of LSTM and ARIMA model-based approaches were conducted in Serafini et al. (2020), in which it is stated that the ARIMA model performs better. Further comparisons with other models and on different cryptocurrencies were further presented in Valencia et al. (2019) and Wołk (2019). According to Valencia et al. (2019), Twitter data is not sufficient to predict Bitcoin price on its own, but can help when combined with other market data. An overview of the related studies is presented in Table 1. Figure 1 provides an overview of the process followed to determine the best model for predicting: (i) the next day's close price direction (i.e. whether it will increase/decrease); and (ii) the magnitude of difference in closing prices. Two main datasets are used in this study: (i) Bitcoin price data; and (ii) Twitter tweets. Historical Bitcoin price data providing a per-minute record of timestamps, opening and closing prices, high and low prices and volume of Bitcoin traded for the period between 1st January 2012 and 31st December 2020 was retrieved from Kaggle. 6 A Twitter dataset 7 (also from Kaggle) was filtered to retrieve tweets that contained either 'bitcoin' or 'btc' . The period of tweets provided in the dataset is between 1st January 2016 and 29th March 2019-including a total of over 20 million tweets. In addition to the text of each tweet, the dataset provides timestamps, tweet IDs and URLs, associated authors' usernames and full names, and the number of replies, likes and retweets that tweets received.

Data cleaning and pre-processing
The following cleaning and pre-processing tasks depicted in Fig. 2 were undertaken on the Twitter dataset: • Removal of non-English tweets 8 and duplicate tweets made by the same user in a similar manner to Pant ( • Processing of hashtags: if the word following the hash sign was not found in a precompiled English wordlist 10 , it was removed; otherwise, the '#' sign was dropped and the word retained as was done by Kraaijeveld and De Smedt (2020); • Removal of tweets containing fewer than 4 words, similarly to Kraaijeveld and De Smedt (2020); • The normalised datasets are available from the repository we have made available to download from https:// github. com/ jacqu esvcr itien/ fyp.
In relation to the Bitcoin pricing dataset, the high and low prices were removed from the feature list so as to only keep the average price per minute 11 . After the the cleaning and pre-processing steps, this study ended up with tweets and prices ranging between 30th August 2018 and 23rd November 2019. A desirable property of the resulting dataset is that Bitcoin prices within the specified date range evinced both downward ($6500 to $3300 = -49%) and upward trends ($3300 to $11500 = +248%).

Determining polarity scores
Following preprocessing, VADER (Hutto and Gilbert 2015) is used to assign sentiment scores to tweets. Similar approaches are used by Valencia et al. (2019); Abraham et al. (2018) and Kraaijeveld and De Smedt (2020) 12 . VADER scores each tweet with a negative, positive, neutral and compound polarity score. The compound score is a sum of the individual sentiment scores, adjusted according to a set of rules and normalised to fall within the [−1, +1] range. However, for the purposes of this study, only positive and negative polarity scores are included in the training and evaluation data sets. VADER was widely used in related work (Valencia et al. 2019;Abraham et al. 2018;Kraaijeveld and De Smedt 2020;Mohapatra et al. 2020;Serafini et al. 2020) and provides advantages including the following: it is open source and free; it is human validated and tuned for Twitter content (Valencia et al. 2019); and it has also been shown to perform competitively with human annotators and has outperformed several benchmarks, especially on social media content (Hutto and Gilbert 2015). 11 and the provided UNIX timestamps were changed to UTC format (so as to match the Twitter dataset). 12 https:// github. com/ cjhut to/ vader Senti ment. 10 We use the Wordlist provided in the nltk.corpus.words library in Python.

Merging datasets and introducing lag
One of the research questions which this work aims to address is the optimal lag to consider that would enable the discovery of a relationship between Bitcoin-related tweets (and in particular the sentiment they express) and actual price change. Indeed, it is not certain that such tweets are the cause of the change in price. However in this work we investigate whether a potential correlation can be seen, and if so what the optimal time lag is (between tweets and the price being affected). This approach was similarly followed by (Stenqvist and Lönnö 2017) and (Balfagih and Keselj 2019), who explored lags ranging from minutes to hours. In contrast to these approaches, in this paper we investigate lag intervals of a number of days-to be exact, 1, 3 or 7 days. To illustrate, Fig. 3 depicts the effect of introducing a lag of 3 days on a dataset-where the original dataset is on the left, and the dataset with a lag introduced is on the right. From the lagged dataset (on the right) note how, as an example, Day 1's score is associated with Day 4's price-i.e. tweets from day 1 are being assumed to affect prices 3 days later (in this example). The reason for choosing to investigate lags of 1, 3 and 7 days is that since this study focuses on making a daily prediction, the minimum lag to be observed should be of at least 1 day. Thereafter it was decided to observe a granularity of a week. The choice of a 3-day lag represents an interval between these two extremes.
The three different lagged datasets (for 1-, 3-and 7-day lags) were created by shifting the price data (of the cleaned and merged dataset) back by the respective number of days in the lag being tested.

Grouping lagged datasets
Lagged datasets consist of preprocessed tweets coupled with their Bitcoin price at the minute the tweet was posted. Subsequently, these are grouped by day in order to allow a model to make daily predictions. Grouping is done in the following manner: • Timestamps of tweets are floored to the hour or day when the tweet was posted; • Tweets are grouped by their floored timestamp; • For a given group, the polarity scores are averaged; • The tweet volume is added as an additional feature, where the volume is the number of tweets in a given day; • The closing Bitcoin price for the day is then identified as the price for the last record for the given day.

Features and labels
The classifiers described below are trained to predict a fluctuation in price based on the following features: 1 Change: Bitcoin price change direction of that day (binary, indicating whether the price rises or falls); 2 Close: Bitcoin's closing price for that day; 3 Positive polarity: The positive polarity score obtained from VADER; 4 Negative polarity: The negative polarity score obtained from VADER; 5 Tweet Volume: The volume of tweets in the relevant interval. This was also investigated in Abraham et al. (2018) which demonstrated that prince changes were highly correlated with tweet volume.
Note that lagged datasets also include the above features for the previous days. For example, as shown in Fig. 4, if the lag is 2, a training instance would include data from the last 2 days. Finally, the label of that instance would be the price change direction of the day following the last lagged feature.

Data split and resampling
In order to train and test the data, the dataset was split using a train-test ratio of 85:15. The reason for this is because of the small number of records available for training and testing after grouping and averaging the original datasets per day. Therefore, such a split allows the model to have a good percentage of the available data to train on while also having a fair number of records to test.
When testing different models and parameters, each set of parameters is tested on 3 different shuffled datasets. However, when shuffling the dataset, the same seed is set to ensure that each model and set of parameters is tested on the same three sets of datasets, allowing for a fair comparison. The training and test sets are prepared by first shuffling the original datasets and then using the first 85% as the training set and the last 15% as the test set.

Predicting next day's close price direction
The direction of the closing price can be framed as a binary classification problem where, given the input corresponding to features extracted from tweets, the task is to predict whether the price will go up or down. Three different models, (i) using an LSTM, (ii) CNN and (iii) Bidirectional Long Short Term Memory Cells (BiLSTM), were implemented for predicting whether the following day's closing price will increase or decrease. These are hereafter referred to as Direction-LSTM, Direction-CNN and Direction-BiLSTM. Table 2 outlines the hyperparameters used for each model. The table also gives the accuracy statistics for each model. These are further discussed in the evaluation section below. It is however evident that the best performing model, in terms of mean accuracy, is Direction-BiLSTM. The architecture of this model is depicted in Fig. 5.

Daily price change magnitude prediction
Another prediction model tries to predict the magnitude of the change of closing day prices as a multi-class classification problem. This is done by predicting which interval the closing day price changes would fall into.
Closing day prices were categorised into ten different bins/classes. An average of the maximum positive ($1563) and maximum negative price ($1746) changes was calculated (and rounded to $1650) to define the lower and upper bins/classes (that were extended to included any greater price change), and then equal steps (of $330) calculated for each bin in between, as can be seen in Table 3.
As before, to predict the magnitude of change in price on the following day, three models were implemented using an LSTM, CNN and BiLSTM, which we'll refer to as Magnitude-LSTM, Magnitude-CNN and Magnitude-BiLSTM for the remainder of this paper. Table 4 summarises the hyperparameters and training settings used for these models, together with the evaluation results (see the evaluation section for this discussion). The Magnitude-CNN model outperforms the other two for this task, as is evident from the mean accuracy and F1 scores. Figure 6 depicts the architecture of this model.

Voting classifier
The best performing models from each of the aforementioned predictive tasks, more specifically, the Direction-BiLSTM and Magnitude-CNN models, were merged together to create a voting classifier model which takes into consideration the outputs from the two models. As Fig. 7 shows, the voting classifier works by first predicting the next day's closing price direction and then, the magnitude of the next day's closing price using the second model. Then, it checks whether the next day's closing price direction matches the direction of the predicted change magnitude. In other words, a match happens: (i) if the first model outputs a 0, which means a decrease in price, and the second model outputs a class from 1 to 5 (negative magnitude of price change); or (ii) if the first model outputs a 1, which means an increase in price, and the second model outputs a class from 6 to 10 (positive magnitude of price change). The prediction of the next day's closing price direction is kept if there is a match in the output of the two classifiers. Moreover, the voting classifier is evaluated on 50 different runs with 50 differently shuffled datasets.

Tweet analysis
The tweets in the dataset are analysed to see how many of the tweets are positive, negative or neutral according to the sentiment scores assigned by VADER. As can be seen, there are more positive and neutral tweets than negative ones. Moreover, the same query was made on the daily grouped data. This resulted in all days being either positive or neutral on average with no days being negative at all. When analysing this metric and the graphs in Fig. 11, one can come to the conclusion that the dataset is imbalanced. However, this dataset contains all tweets with '#btc' and '#bitcoin' and therefore, the dataset reflects the reality of tweets. In addition, it could very well be the case that people tend to tweet more positively than negatively as seen in Pantano et al. (2018), Zhou et al. (2013), Abraham et al. (2018) and Kraaijeveld and De Smedt (2020).

Daily price trend prediction
For the three price direction prediction models (Direction-LSTM, Direction-CNN and Direction-BiLSTM) at different time-lags of 1, 3 and 7 days, Figs. 8, 9 and 10 display the mean accuracy, the maximum accuracy and the range-the difference between maximum and minimum accuracies, or Difference between Maximum and Lowest Accuracies (DBMLA) respectively. See Table 2 for descriptive statistics corresponding to these plots (Fig. 11). The mean and maximum are computed over 5 runs for each model, with data randomly shuffled for each run. These results give rise to the following observations. First, mean accuracy is highest for a single day lag, with the 7-day lag in second place. Despite the highest mean accuracy being obtained in the shortest time lag, we cannot definitively conclude that the price fluctuation is strongest over shorter time periods since we also observe that a 7-day lag achieves a higher mean accuracy than a 3-day lag. At the same time, the relationship is not linear. Whilst a 3-day lag results in the lowest accuracies on average, when considering the maximum accuracy obtained by a model, higher maximum scores are observed for a 3-day lag (with the exception of the CNN model). This implies that the results suggest that mean accuracies are subject to considerable variation, leading to an effect whereby short time lags of 1 day may benefit from an 'immediacy effect' (fluctuations which are closer in time can be better predicted on average). While longer time lags may result in lower variance in the data overall (so that averages over 7-day lags are better than those over 3 days). Figure 12 shows a direct comparison of the best CNN, BiLSTM and LSTM models. The BiLSTM architecture (Fig. 5) achieves overall higher mean accuracies coupled with lower variance, as evidenced by its lower DBMLA. Hence, it is considered the best model overall. It is worth noting that this model achieves a maximum accuracy that outperforms those reported in Galeshchuk et al. (2018), Li and Dai (2020) and Livieris et al. (2021). The main reason for why LSTMs seem well-suited for this specific problem is that they are inherently sequential models and thus, expected to do reasonably well at predicting a trend over time. Furthermore, we can also see that if we introduce bidirectionality and allow the model to look both forwards and backwards around a given time, we can also achieve better results. The daily price trend prediction algorithm in this study results in a 64.2% maximum accuracy which is less than that of the models proposed by Pant (2018) and Valencia et al. (2019) which resulted in 77.6% and 72% respectively. This is likely due to the data periods used. Our study spans across around 450 days, whilst their studies were based on around 180 and 60 days respectively-and looking more closely at the the data for the given periods 13 it is clear that the periods used in their studies were times of rather low volatility (by looking at the standard deviation of daily returns 14 ). Whilst in our study (which makes use of a substantially larger window) volatility is seen to fluctuate much more over the whole period.

Daily price change magnitude prediction
Figure 13 displays F1 scores for the three types of models (Magnitude-LSTM, Magnitude-CNN and Magnitude-BiLSTM) for price change magnitude prediction. The corresponding descriptive statistics can be found in Table 4.
Once again, performance is generally worse with a 7-day lag in nearly all cases, whereas the shorter time lag of 1 day results in the best F1 scores. However, the CNN model outperforms the other models on the 3-day lag dataset. This is confirmed by the per-class F1 scores in Fig. 14, where the CNN model outperforms both the LSTM and BiLSTM models in nearly all classes apart from class 4, in which the difference is in any case small.
While sequence information was found to be useful when predicting the direction of the change, here we can see that sequence information is less important when predicting the actual size of the change. Based on these results, the CNN model can be identified as the best model to predict the magnitude in price change. However, it is worth noting   that F1 scores could not be reliably computed for all classes. This is likely due to data sparseness, with few instances of a given class in the test set. Table 5 shows that when evaluating the voting classifier, the mean accuracy was increased by around 8%, achieving a 68.4% accuracy while the maximum accuracy achieved increased by 13%, achieving a maximum accuracy of 77.2%. This significant increase in accuracy might suggest that implementing a modular approach which first identifies the direction and then predicts the actual bin for the price change, achieves higher accuracy levels.

Conclusions
This paper compared the performance of a number of different neural models for predicting fluctuations in cryptocurrency prices from Twitter tweet data. The underlying hypothesis of this work is that opinions expressed in social media can function as useful predictors of such fluctuations, especially insofar as they incorporate features such as sentiment and opinion. One important question is whether the predictive value of features gleaned from social media depends on the time lag between their publication and the time of prediction. The experiments presented in this paper show that competitive results can be achieved with a 2-layer BiLSTM model trained on a dataset with a 1-day time lag and using seven different lagged features, meaning that each instance consists of features from tweets from the seven previous days. This model achieves a maximum accuracy of 64.18%. It must be highlighted, that whilst this configuration heeded the best results, this does not necessarily imply that a 1-day lag always results in better predictions. In fact, from the results presented herein, other temporal lags perform better under other configurations. Therefore, future work should be undertaken to further investigate the impact of temporal lags in more detail. Furthermore, whilst the BiL-STM overall outperformed the CNN and LSTM implementations for price direction prediction, the CNN outperformed the others for the change in magnitude prediction. Future work should be undertaken to further analyse with varying datasets to determine whether this is due to particular features in the data and/or why the different algorithms perform better/worse for varying parameters.
As can be seen in Table 6, the Direction-BiLSTM model presented here, which predicts the direction of the next day's closing price, outperforms some previously proposed models, including those reported by Li and Dai (2020), McNally et al. (2018) and Livieris et al. (2021). In addition, the voting classifier also managed to outperform an additional two studies, Galeshchuk et al. (2018) and Valencia et al. (2019). On the other hand, better results have been reported on a similar task in Pant (2018), Stenqvist and Lönnö (2017) and Kilimci (2020). However, while these studies were based on data made up of 180, 30 and 90 days respectively, in this work we sought to train and generalise over a dataset of around 1 year and 3 months' worth of data. Therefore, one would need to reevaluate the accuracy obtained for these studies on larger datasets. A somewhat different classification task was also proposed, namely, one where rather than predicting the direction of a price fluctuation, the goal is to determine the magnitude of the next day's closing price. The performance of this task cannot be compared to other studies' because this is the first study that tries to predict the size of the next day's closing price change. Here, it was shown that a model based on a 2-layer CNN with a 2-day time lag and 3 lagged features performs reasonably well, with a 57% accuracy over 10 classes. In addition, this model manages to achieve a direction accuracy of 63.3%, which is calculated by checking the predicted class, where classes 1-5 indicate a downtrend prediction while classes 6-10 indicate an uptrend prediction.
With regards to how lag affects price, it was evident that in nearly all cases the dataset with 7 days lag performed worst, suggesting that a 7-day lag is too long to capture a predictive relationship between social media content and price. In general, a 3-day lag results in higher maximum accuracies, though at the expense of lower overall means and a higher variation (as reflected in the DBMLA, the difference between minimum and maximum accuracy). Thus, a 1-day lag might yield more reliable predictions, since tweets are closer to the affected price. On the other hand, a slightly longer time period of 3 days might allow for possible 'ripple effects' , whereby sentiment and opinion accumulate over time in response to ongoing trends, yielding good predictions for future trends in some cases. The precise nature of the relationship between time and predictive power of sentiment in economic matters is an important avenue for future work.
The main obstacle singled out in relation to achieving better accuracy results is the data used to train and test the implemented model because since the data is grouped daily, it causes the dataset to shrink to only a record per day, making the dataset small and hence, more difficult for the models to generalise over. Therefore, collecting more tweets and building a bigger dataset could prove vital in following up on this research.