Skip to main content

Table 3 The detailed information of text analysis techniques used in the literature

From: How are texts analyzed in blockchain research? A systematic literature review

Analysis type

Sub-category

Specific technique

References

Example papers

Feature extraction

Count-based

BoW

Zhang et al. (2010)

Yen et al. (2021)

N-Gram

Cavnar et al. (1994)

El-Masri and Hussain (2021)

TF-IDF

Ramos (2003)

Pan et al. (2020)

DDPWI

Proposed in the paper

Burnie and Yilmaz (2019)

Word/Sentence embedding

Word2vec

Mikolov et al. (2013)

Kilimci (2020); Kim et al. (2020); Liu et al. (2021)

Doc2vec

Le and Mikolov (2014)

GloVe

Pennington et al. (2014)

FastText

Bojanowski et al. (2017)

Affective Tweet

https://affectivetweets.cms.waikato.ac.nz

Balfagih and Keselj (2019)

A-BiRNN

Proposed in the paper

Xu et al. (2021)

Sentiment analysis

Lexicon/rule-based

VADER

Hutto and Gilbert (2014)

Kim et al. (2016); Abraham et al. (2018)

TextBlob

https://textblob.readthedocs.io

Jain et al. (2018); Li et al. (2019)

Sentistrength

http://sentistrength.wlv.ac.uk

Caviggioli et al. (2020)

SentiWordNet

Baccianella et al. (2010)

Cheuque Cerda and L. Reutter (2019)

Alex Davies word list

Christie and Huang (1995)

Stratopoulos et al. (2022)

Bing

https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Grover et al. (2019); Hassan et al. (2021)

AFINN

Nielsen (2011)

Ayvaz and Shiha (2018); Toma and Cerchiello (2020)

LM lexicon

Loughran and McDonald (2011)

Mai et al. (2018); Dittmar and Wu (2019)

Harvard-IV General Purpose Psychological Dictionary

Stone et al. (1966)

Karalevicius et al. (2018)

Quantitative Discourse Analysis Package

https://www.rdocumentation.org/packages/qdap/versions/2.4.3

Sapkota and Grobys (2021)

Sentiment analysis

Lexicon/rule-based

Henry’s finance-specific dictionary

Henry (2008)

Mnif et al. (2021); Anamika and Subramaniam (2022)

Pattern library

https://github.com/clips/pattern

Galeshchuk et al. (2018)

SentimentR

https://github.com/trinker/sentimentr

Rahman et al. (2018); Chiarello et al. (2021)

Ethical and unethical words dictionary

Constructed in the paper

Barth et al. (2020)

63 cryptocurrency words and abbreviations

Constructed in the paper

Kraaijeveld and de Smedt (2020)

Crypto-specific sentiment dictionary (in Chinese)

Constructed in the paper

Huang et al. (2021)

Crypto-specific lexicon (words, emojis, informal langugage)

Constructed in the paper

Chen et al. (2019a)

Machine leanring-based (algorithms)

Long short-term memory (LSTM)

Hochreiter and Schmidhuber (1997)

Inamdar et al. (2019); Şaşmaz and Tek (2021)

Recurrent neural network

Goldberg (2017)

Random forest

Ho (1995)

Naïve Bayes

Jurafsky and Martin (2017)

Support vector machine

Boser et al. (1992)

Gradient boosting

Friedman (2001)

BERT

Devlin et al. (2018)

Bashchenko (2022); Ortu et al. (2022)

Bidirectional LSTM

Mousa and Schuller (2017)

Han et al. (2020)

Voting-included Algorithm

Constructed in the paper

Pant et al. (2018)

Sentiment Graph

Constructed in the paper

Yao et al. (2019)

Analytics Tool

Crimson Hexagon social sentiment

https://www.carahsoft.com/crimson-hexagon

Stanley (2019)

Semantria

https://www.lexalytics.com

Caviggioli et al. (2020)

Meaningcloud

https://www.meaningcloud.com

Sentiment analysis

Analytics Tool

StanfordCoreNLP

https://stanfordnlp.github.io/CoreNLP

Moustafa et al. (2022)

OPView

https://www.opview.com.tw

Lu et al. (2017)

RavenPack

https://www.ravenpack.com/products/edge/data/news-analytics

Rognone et al. (2020)

Emotion metrics

NRC-VAD Emotion Lexicon

https://saifmohammad.com/WebPages/nrc-vad.html

Toma and Cerchiello (2020)

NRC Word-Emotion Association Lexicon

https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

Chursook et al. (2022)

Text2Emotion

https://shivamsharma26.github.io/text2emotion

Aslam et al. (2022)

Topic modeling

Topic modeling algorithm

LDA

Blei et al. (2003)

Fu et al. (2019); Hirata et al. (2021); Laturnus (2022)

DTM

Blei and Lafferty (2006)

Linton et al. (2017); Lee et al. (2022)

SentLDA

Bao and Datta (2014)

Thewissen et al. (2022)

Joint/sentiment topic model

Lin and He (2009)

Loginova et al. (2021)

Topic sentiment latent dirichlet allocation

Nguyen and Shirai (2015)

Nonnegative Matrix Factorization

Lee and Seung (1999, 2000)

Kang et al. (2020)

Anchored Correlation Explanation

Gallagher et al. (2017)

Nizzoli et al. (2020)

Word2vec-based Latent Seman-

tic Analysis (W2V-LSA)

Proposed in the paper

Kim et al. (2020)

Analytics tool

Leximancer

https://www.leximancer.com

Daluwathumullagamage and Sims (2020); Perdana et al. (2021)

Text similarity

 

Cosine Similarity

Kwon and Lee (2003)

Yen et al. (2021)

 

Jaccard Similarity Coefficient

Jaccard (1912)

Sapkota and Grobys (2021)

 

SBERT

Reimers and Gurevych (2020)

Bashchenko (2022)

Clustering

 

K-means clustering

MacQueen (1967)

Choi et al. (2022)

DBSCAN clustering

Ester et al. (1996)

Classifier

Machine learning algorithm

Catboost

Prokhorenkova et al. (2018)

Chousein et al. (2020); Schwenkler and Zheng (2021)

Random Forest

Ho (1995)

XGBoost

Chen and Guestrin (2016)

Neural network

Hashimoto et al. (2016)

Naïve Bayes

Jurafsky and Martin (2017)

Readability

Flesch-Kincaid Readability

Flesch (1979)

Narman et al. (2018); Sapkota and Grobys (2021)

Dale-Chall Readability

Dale and Chall (1948)

Gunning Fog Index

Gunning (1952)

Automated Readability Index

Senter and Smith (1967)

Simple Measure of Gobbledygook

McLaughlin (1969)

Coleman-Liau Index

Coleman and Liau (1975)

Linsear Write

Klare (1974)

AWS blockchain template

https://docs.aws.amazon.com/blockchain-templates

Stanley (2019)

Network analysis

Google knowledge graph

https://developers.google.com/knowledge-graph

Pan et al. (2020)