How are texts analyzed in blockchain research? A systematic literature review

Zhuo, Xian; Irresberger, Felix; Bostandzic, Denefa

doi:10.1186/s40854-023-00501-6

Financial Innovation

Table 3 The detailed information of text analysis techniques used in the literature

From: How are texts analyzed in blockchain research? A systematic literature review

Analysis type	Sub-category	Specific technique	References	Example papers
Feature extraction	Count-based	BoW	Zhang et al. (2010)	Yen et al. (2021)
		N-Gram	Cavnar et al. (1994)	El-Masri and Hussain (2021)
		TF-IDF	Ramos (2003)	Pan et al. (2020)
		DDPWI	Proposed in the paper	Burnie and Yilmaz (2019)
	Word/Sentence embedding	Word2vec	Mikolov et al. (2013)	Kilimci (2020); Kim et al. (2020); Liu et al. (2021)
		Doc2vec	Le and Mikolov (2014)
		GloVe	Pennington et al. (2014)
		FastText	Bojanowski et al. (2017)
		Affective Tweet	https://affectivetweets.cms.waikato.ac.nz	Balfagih and Keselj (2019)
		A-BiRNN	Proposed in the paper	Xu et al. (2021)
Sentiment analysis	Lexicon/rule-based	VADER	Hutto and Gilbert (2014)	Kim et al. (2016); Abraham et al. (2018)
		TextBlob	https://textblob.readthedocs.io	Jain et al. (2018); Li et al. (2019)
		Sentistrength	http://sentistrength.wlv.ac.uk	Caviggioli et al. (2020)
		SentiWordNet	Baccianella et al. (2010)	Cheuque Cerda and L. Reutter (2019)
		Alex Davies word list	Christie and Huang (1995)	Stratopoulos et al. (2022)
		Bing	https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html	Grover et al. (2019); Hassan et al. (2021)
		AFINN	Nielsen (2011)	Ayvaz and Shiha (2018); Toma and Cerchiello (2020)
		LM lexicon	Loughran and McDonald (2011)	Mai et al. (2018); Dittmar and Wu (2019)
		Harvard-IV General Purpose Psychological Dictionary	Stone et al. (1966)	Karalevicius et al. (2018)
		Quantitative Discourse Analysis Package	https://www.rdocumentation.org/packages/qdap/versions/2.4.3	Sapkota and Grobys (2021)
Sentiment analysis	Lexicon/rule-based	Henry’s finance-specific dictionary	Henry (2008)	Mnif et al. (2021); Anamika and Subramaniam (2022)
		Pattern library	https://github.com/clips/pattern	Galeshchuk et al. (2018)
		SentimentR	https://github.com/trinker/sentimentr	Rahman et al. (2018); Chiarello et al. (2021)
		Ethical and unethical words dictionary	Constructed in the paper	Barth et al. (2020)
		63 cryptocurrency words and abbreviations	Constructed in the paper	Kraaijeveld and de Smedt (2020)
		Crypto-specific sentiment dictionary (in Chinese)	Constructed in the paper	Huang et al. (2021)
		Crypto-specific lexicon (words, emojis, informal langugage)	Constructed in the paper	Chen et al. (2019a)
	Machine leanring-based (algorithms)	Long short-term memory (LSTM)	Hochreiter and Schmidhuber (1997)	Inamdar et al. (2019); Şaşmaz and Tek (2021)
		Recurrent neural network	Goldberg (2017)
		Random forest	Ho (1995)
		Naïve Bayes	Jurafsky and Martin (2017)
		Support vector machine	Boser et al. (1992)
		Gradient boosting	Friedman (2001)
		BERT	Devlin et al. (2018)	Bashchenko (2022); Ortu et al. (2022)
		Bidirectional LSTM	Mousa and Schuller (2017)	Han et al. (2020)
		Voting-included Algorithm	Constructed in the paper	Pant et al. (2018)
		Sentiment Graph	Constructed in the paper	Yao et al. (2019)
	Analytics Tool	Crimson Hexagon social sentiment	https://www.carahsoft.com/crimson-hexagon	Stanley (2019)
		Semantria	https://www.lexalytics.com	Caviggioli et al. (2020)
		Meaningcloud	https://www.meaningcloud.com	Caviggioli et al. (2020)
Sentiment analysis	Analytics Tool	StanfordCoreNLP	https://stanfordnlp.github.io/CoreNLP	Moustafa et al. (2022)
		OPView	https://www.opview.com.tw	Lu et al. (2017)
		RavenPack	https://www.ravenpack.com/products/edge/data/news-analytics	Rognone et al. (2020)
Emotion metrics		NRC-VAD Emotion Lexicon	https://saifmohammad.com/WebPages/nrc-vad.html	Toma and Cerchiello (2020)
		NRC Word-Emotion Association Lexicon	https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm	Chursook et al. (2022)
		Text2Emotion	https://shivamsharma26.github.io/text2emotion	Aslam et al. (2022)
Topic modeling	Topic modeling algorithm	LDA	Blei et al. (2003)	Fu et al. (2019); Hirata et al. (2021); Laturnus (2022)
		DTM	Blei and Lafferty (2006)	Linton et al. (2017); Lee et al. (2022)
		SentLDA	Bao and Datta (2014)	Thewissen et al. (2022)
		Joint/sentiment topic model	Lin and He (2009)	Loginova et al. (2021)
		Topic sentiment latent dirichlet allocation	Nguyen and Shirai (2015)	Loginova et al. (2021)
		Nonnegative Matrix Factorization	Lee and Seung (1999, 2000)	Kang et al. (2020)
		Anchored Correlation Explanation	Gallagher et al. (2017)	Nizzoli et al. (2020)
		Word2vec-based Latent Seman- tic Analysis (W2V-LSA)	Proposed in the paper	Kim et al. (2020)
	Analytics tool	Leximancer	https://www.leximancer.com	Daluwathumullagamage and Sims (2020); Perdana et al. (2021)
Text similarity		Cosine Similarity	Kwon and Lee (2003)	Yen et al. (2021)
		Jaccard Similarity Coefficient	Jaccard (1912)	Sapkota and Grobys (2021)
		SBERT	Reimers and Gurevych (2020)	Bashchenko (2022)
Clustering		K-means clustering	MacQueen (1967)	Choi et al. (2022)
Clustering		DBSCAN clustering	Ester et al. (1996)	Choi et al. (2022)
Classifier	Machine learning algorithm	Catboost	Prokhorenkova et al. (2018)	Chousein et al. (2020); Schwenkler and Zheng (2021)
		Random Forest	Ho (1995)
		XGBoost	Chen and Guestrin (2016)
		Neural network	Hashimoto et al. (2016)
		Naïve Bayes	Jurafsky and Martin (2017)
Readability		Flesch-Kincaid Readability	Flesch (1979)	Narman et al. (2018); Sapkota and Grobys (2021)
		Dale-Chall Readability	Dale and Chall (1948)
		Gunning Fog Index	Gunning (1952)
		Automated Readability Index	Senter and Smith (1967)
		Simple Measure of Gobbledygook	McLaughlin (1969)
		Coleman-Liau Index	Coleman and Liau (1975)
		Linsear Write	Klare (1974)
		AWS blockchain template	https://docs.aws.amazon.com/blockchain-templates	Stanley (2019)
Network analysis		Google knowledge graph	https://developers.google.com/knowledge-graph	Pan et al. (2020)

Back to article page