From: How are texts analyzed in blockchain research? A systematic literature review
Analysis type | Sub-category | Specific technique | References | Example papers |
---|---|---|---|---|
Feature extraction | Count-based | BoW | Zhang et al. (2010) | Yen et al. (2021) |
N-Gram | Cavnar et al. (1994) | El-Masri and Hussain (2021) | ||
TF-IDF | Ramos (2003) | Pan et al. (2020) | ||
DDPWI | Proposed in the paper | Burnie and Yilmaz (2019) | ||
Word/Sentence embedding | Word2vec | Mikolov et al. (2013) | ||
Doc2vec | Le and Mikolov (2014) | |||
GloVe | Pennington et al. (2014) | |||
FastText | Bojanowski et al. (2017) | |||
Affective Tweet | Balfagih and Keselj (2019) | |||
A-BiRNN | Proposed in the paper | Xu et al. (2021) | ||
Sentiment analysis | Lexicon/rule-based | VADER | Hutto and Gilbert (2014) | |
TextBlob | ||||
Sentistrength | Caviggioli et al. (2020) | |||
SentiWordNet | Baccianella et al. (2010) | Cheuque Cerda and L. Reutter (2019) | ||
Alex Davies word list | Christie and Huang (1995) | Stratopoulos et al. (2022) | ||
Bing | ||||
AFINN | Nielsen (2011) | |||
LM lexicon | Loughran and McDonald (2011) | |||
Harvard-IV General Purpose Psychological Dictionary | Stone et al. (1966) | Karalevicius et al. (2018) | ||
Quantitative Discourse Analysis Package | Sapkota and Grobys (2021) | |||
Sentiment analysis | Lexicon/rule-based | Henry’s finance-specific dictionary | Henry (2008) | |
Pattern library | Galeshchuk et al. (2018) | |||
SentimentR | ||||
Ethical and unethical words dictionary | Constructed in the paper | Barth et al. (2020) | ||
63 cryptocurrency words and abbreviations | Constructed in the paper | Kraaijeveld and de Smedt (2020) | ||
Crypto-specific sentiment dictionary (in Chinese) | Constructed in the paper | Huang et al. (2021) | ||
Crypto-specific lexicon (words, emojis, informal langugage) | Constructed in the paper | Chen et al. (2019a) | ||
Machine leanring-based (algorithms) | Long short-term memory (LSTM) | Hochreiter and Schmidhuber (1997) | ||
Recurrent neural network | Goldberg (2017) | |||
Random forest | Ho (1995) | |||
Naïve Bayes | Jurafsky and Martin (2017) | |||
Support vector machine | Boser et al. (1992) | |||
Gradient boosting | Friedman (2001) | |||
BERT | Devlin et al. (2018) | |||
Bidirectional LSTM | Mousa and Schuller (2017) | Han et al. (2020) | ||
Voting-included Algorithm | Constructed in the paper | Pant et al. (2018) | ||
Sentiment Graph | Constructed in the paper | Yao et al. (2019) | ||
Analytics Tool | Crimson Hexagon social sentiment | Stanley (2019) | ||
Semantria | Caviggioli et al. (2020) | |||
Meaningcloud | ||||
Sentiment analysis | Analytics Tool | StanfordCoreNLP | Moustafa et al. (2022) | |
OPView | Lu et al. (2017) | |||
RavenPack | Rognone et al. (2020) | |||
Emotion metrics | NRC-VAD Emotion Lexicon | Toma and Cerchiello (2020) | ||
NRC Word-Emotion Association Lexicon | Chursook et al. (2022) | |||
Text2Emotion | Aslam et al. (2022) | |||
Topic modeling | Topic modeling algorithm | LDA | Blei et al. (2003) | |
DTM | Blei and Lafferty (2006) | |||
SentLDA | Bao and Datta (2014) | Thewissen et al. (2022) | ||
Joint/sentiment topic model | Lin and He (2009) | Loginova et al. (2021) | ||
Topic sentiment latent dirichlet allocation | Nguyen and Shirai (2015) | |||
Nonnegative Matrix Factorization | Kang et al. (2020) | |||
Anchored Correlation Explanation | Gallagher et al. (2017) | Nizzoli et al. (2020) | ||
Word2vec-based Latent Seman- tic Analysis (W2V-LSA) | Proposed in the paper | Kim et al. (2020) | ||
Analytics tool | Leximancer | Daluwathumullagamage and Sims (2020); Perdana et al. (2021) | ||
Text similarity |  | Cosine Similarity | Kwon and Lee (2003) | Yen et al. (2021) |
 | Jaccard Similarity Coefficient | Jaccard (1912) | Sapkota and Grobys (2021) | |
 | SBERT | Reimers and Gurevych (2020) | Bashchenko (2022) | |
Clustering |  | K-means clustering | MacQueen (1967) | Choi et al. (2022) |
DBSCAN clustering | Ester et al. (1996) | |||
Classifier | Machine learning algorithm | Catboost | Prokhorenkova et al. (2018) | |
Random Forest | Ho (1995) | |||
XGBoost | Chen and Guestrin (2016) | |||
Neural network | Hashimoto et al. (2016) | |||
Naïve Bayes | Jurafsky and Martin (2017) | |||
Readability | Flesch-Kincaid Readability | Flesch (1979) | ||
Dale-Chall Readability | Dale and Chall (1948) | |||
Gunning Fog Index | Gunning (1952) | |||
Automated Readability Index | Senter and Smith (1967) | |||
Simple Measure of Gobbledygook | McLaughlin (1969) | |||
Coleman-Liau Index | Coleman and Liau (1975) | |||
Linsear Write | Klare (1974) | |||
AWS blockchain template | Stanley (2019) | |||
Network analysis | Google knowledge graph | Pan et al. (2020) |