An interval constraint-based trading strategy with social sentiment for the stock market

Li, Mingchen; Yang, Kun; Lin, Wencan; Wei, Yunjie; Wang, Shouyang

doi:10.1186/s40854-023-00567-2

Research
Open access
Published: 10 February 2024

An interval constraint-based trading strategy with social sentiment for the stock market

Mingchen Li^1,2,
Kun Yang^1,2,3,
Wencan Lin^1,2,
Yunjie Wei ORCID: orcid.org/0000-0001-8737-7975^1,3 &
…
Shouyang Wang^1,3

Financial Innovation volume 10, Article number: 56 (2024) Cite this article

1294 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

Developing effective strategies to earn excess returns in the stock market is a cutting-edge topic in the field of economics. At the same time, stock price forecasting that supports trading strategies is considered one of the most challenging tasks. Therefore, this study analyzes and extracts news media data, expert comments, social opinion data, and pandemic text data using natural language processing, and then combines the data with a deep learning model to forecast future stock price patterns based on historical stock prices. An interval constraint-based trading strategy is constructed. Using data from several typical stocks in the Chinese stock market during the COVID-19 period, the empirical studies and trading simulations show, first, that the sentiment composite index and the deep learning model can improve the accuracy of stock price forecasting. Second, the interval constraint-based trading strategy based on the proposed approach can effectively enhance returns and thus, can assist investors in decision-making.

Introduction

The stock market is a crucial conduit for businesses to raise funds from investors and is a financial ecosystem connecting corporations and investors (Zhong and Enke 2019). The enormous trading volume and profitability of the stock market continues to attract investors and traders keen to employ this system to maximize their profits (Adam et al. 2016). However, the stock market has the characteristic of extreme volatility and non-stationarity, making it prone to numerous complicated shocks and games. Consequently, there are obstacles to devising solid trading methods and making profitable investment decisions (Gu and Peng 2019). Since the turn of the 20th century, a constant stream of financial institutions and researchers have been developing stock price forecasting models. With the expansion of computer technology, an increasing number of superior models, including deep learning, seek to decrease stochasticity and identify consistent trends by collecting and evaluating historical data and useful technical indicators (Salisu and Vo 2020; Liu et al. 2020).

At the beginning of the century, Hinton et al. introduced the notion of deep learning, thereby resolving the enduring impasse surrounding the arduous training of deep neural networks (Hinton et al. 2006). Meanwhile, Bengio et al. established a robust framework for the application of deep learning in addressing language modeling challenges (Bengio et al. 2000; Khurana et al. 2023). Subsequently, deep learning and natural language processing (NLP) have gained extensive utilization across diverse domains (Lecun et al. 2015). Within the realm of finance, these techniques have been employed to facilitate financial forecasting, credit default prediction, mortgage risk estimation, and risk–return management, among other applications (Zhong and Enke 2019; Alonso Robisco and Carbó Martínez 2022; Calomiris and Mamaysky 2019; Xing et al. 2018). These technological advances along with the growth of social media have collectively driven the widespread use of unstructured data (especially news data and social media data), which has improved the predictive power of models.

Specifically, the factors that make this phenomenon noteworthy are as follows. First, the efficient market hypothesis assumes that market investors are rational and seek the greatest possible profits (Fama 1970). However, as the Dutch tulip bubble and the American Internet bubble showed, investors are not always rational (Audrino et al. 2020). According to previous studies, investor sentiment and stock returns are mutually limiting and influential. In other words, the price of stocks in the market is not only defined by the intrinsic worth of companies, but is also heavily impacted by the investing subject; that is, psychological considerations and investor behavior have a significant impact on the price decisions and movements of stocks (Bustos and Pomares-Quimbaya 2020; Liang et al. 2020). Second, social platforms or news websites, as types of digital economy presentation, are increasingly crucial avenues for consumers or investors to exchange perspectives, feelings, and knowledge. Compared to conventional data sources, these platforms’ data offer the benefits of a broad user base, high socializing, high engagement, and rapid reaction times (Audrino et al. 2020; Hong et al. 2017). The efficient use of this information and its integration into research on the stock market is a highly rewarding and challenging task.

In the COVID-19 era, it is worth considering what texts should be employed for stock price analysis. In conjunction with previous studies, the following categories of data are the focus of this study: 1. news, which is a significant way for the general public to get official or more formal information through the media (Narayan 2019); 2. comments, especially from investors and practitioners, which are a synthesis of sentiment from relative professionals; 3. social media data, which reflect the collective wisdom of the general people (Teti et al. 2019); and 4. pandemic data. COVID-19 has triggered significant stock market volatility, as is common knowledge. To prevent the spread of the disease, governments across the globe implemented stricter policy controls, including limitations on labor mobility and quarantines (Salisu and Vo 2020; He et al. 2020; Li et al. 2022). This resulted in a number of challenges in global supply chains, including reduced supply and decreased demand, which discouraged investment and reduced business and consumer confidence. In this scenario, global stock markets suffer setbacks (OECD 2020). COVID-19 has had a significant impact on equity markets, and thus, pandemic-related data should also be considered. No previous study on stock price volatility has considered all of these factors.

Therefore, this study proposes a novel forecasting and interval-constraint trading approach based on deep learning and sentiment analysis for forecasting and simulating stock price fluctuations in the COVID-19 period in conjunction with big data. First, text data from four different perspectives are collected: news media, expert comments, social opinion, and the pandemic. Second, the relevant texts are then analyzed with natural language processing techniques to provide sentiment indexes that represent a combination of official, popular, and social contexts. Third, with the support of deep learning models, we employ historical data, search engine data, and sentiment indexes to forecast stock prices, including point value forecasts and interval forecasts. Fourth, we employ a number of assessment criteria and statistical tests to test the forecasting capacity of the model. Lastly, traditional trading strategies are based on forecasts for long or short positions, which carry a lot of risk, especially when located in periods of high volatility. This study proposes combining the interval estimation algorithm to add an insurance policy to a trading strategy that can support significant improvement in trading returns under high volatility conditions.

The innovation of this study lies in three aspects. First, the data are considered comprehensively. The data form includes time-series data and textual data; the data sources include news media, stock market experts, and the general public; and the data connotation includes stock characteristics, practitioner sentiment, and pandemic information. Second, this study makes innovative use of models. We employ temporal convolutional network (TCN), an effective deep learning model, combined with interval estimation algorithms to generate a reliable forecasting framework (containing point forecasts and interval forecasts). Third, this is the first study to construct a trading strategy based on interval constraints. Interval restrictions are added to the general point forecast-based trading strategy to avoid irrational investments in high volatility periods or to generate huge returns.

The rest of the paper is structured as follows. The literature review and discussion related to this study is presented in "Literature review". The proposed methodology and related models are presented in "Methodology". "Empirical analysis" shows the experimental procedure, including the data collection, data processing, forecasting results, and trading simulations. "Conclusion" presents the conclusions. Finally, the discussion and prospects for future research are provided in "Discussion and prospects".

Literature review

This section consists of two parts: the first introduces multiple types of models applied to stock price forecasting, including statistical models and artificial intelligence models; the second introduces the application of social media data in previous stock price forecasting studies, including search indexes and text data.

Forecasting models for stock price

An intriguing topic in finance and forecasting research has been how to make more accurate stock price forecasts (Kumbure et al. 2022). Numerous models have been developed to describe the volatility and trend of various stock prices on different exchange platforms. These models may be categorized into two groups: traditional statistics and artificial intelligence (mainly machine learning algorithms) (Deng et al. 2022; Liu et al. 2021).

Traditional statistics models include the vector autoregressive model, autoregressive conditional heteroskedasticity model, and autoregressive integrated moving average (ARIMA) model, among others (Ribeiro Ramos 2003; Wang et al. 2022). Traditional statistical models, which are effective instruments for illuminating the inner workings of financial market functioning, are frequently utilized in stock market forecasting and analysis. For instance, Jiang et al. (2021) found that oil prices have a significant impact on stock returns in the short term by using a structural threshold vector autoregression model. However, when using traditional statistics models, the linearity or stationarity assumptions in statistical data should be satisfied, which is typically challenging when using high-frequency data from the stock market (Kumbure et al. 2022; Tang et al. 2022). Consequently, conventional statistical models have limitations in forecasting stock prices (Lin et al. 2021).

The abovementioned issues are not present in machine learning models based on artificial intelligence algorithms (Vuong et al. 2022). When complex structures, such as nonlinear high-frequency stock trading data, are present, machine learning algorithms can provide more accurate forecasts (Yun et al. 2022). Many instances of anticipating stock prices have been used in classical machine learning research. Gupta et al. (2019) examined the predictability of stock returns using the quantile random forests method. They used indicators of inequality based on consumption and income to forecast stock returns, providing a new approach for predicting stock returns. Sadaei et al. (2016) and Kao et al. (2013) applied fuzzy set and support vector regression (SVR) to forecast stock prices, respectively. They both achieved good forecasting results. For application of other classical machine learning methods in stock price forecasting, refer to Na and Kim (2021), Zhang and Lou (2021), and Shahi et al. (2020), among others.

The forecasting of stock price using machine learning techniques is now a topic of focus with rich research. Many researchers have proposed novel machine learning methods, which can achieve more robust and more accurate forecast results. Deng et al. (2022) combined a deep learning algorithm with multivariate empirical mode decomposition, and further built a multi-input and multi-output network framework to achieve multi-step forecasting of stock prices. The empirical results show this combination method can realize better prediction results. Although the combination of a machine learning algorithm with a decomposition integration method can raise forecasting accuracy, it also makes the computation more difficult. To resolve this problem, Guo et al. (2022) employed a system clustering method and particle swarm optimization to construct a decomposition and reconstruction model, which not only reduced the complexity of the algorithm, but also obtained more accurate forecasting results. Additionally, several studies have combined different machine learning techniques to overcome the drawbacks of a single technique. For example, Ghosh et al. (2022) mixed random forest and long short-term memory network (LSTM) to achieve more accurate stock price forecasting results than the single machine learning method.

TCN is a novel type of neural network improved from the one-dimensional convolutional neural network. TCN has been shown to outperform LSTM in numerous domains, including voice processing, machine translation, and time-series forecasting, while retaining the robust feature extraction capabilities of conventional convolutional neural networks (Zhu et al. 2020; Shomron and Weiser 2019).

In such a scenario, many benchmark models of the two kinds mentioned above are used in this study; they include seasonal autoregressive integrated moving average (SARIMA), exponential smoothing (ES), SVR, extreme learning machine (ELM), back propagation neural network (BPNN), LSTM, and TCN. TCN is the main model of interest owing to its benefits of higher parallelism, stable gradients, and minimal memory requirements.

Social media and stock price forecasting

As Web 2.0 takes off, more and more investors are turning to the Internet to obtain and share real-time stock-related news (Sanford 2022). Owing to the rapid diffusion of influence through the Internet, experts’ and other influential persons’ written views on stocks may affect the decisions of others. The effects are dual (Maqsood et al. 2020; Gu and Kurov 2020). On the one hand, Internet user comments and event information can have a substantial effect on the price of a stock. On the other hand, sudden fluctuations in stock price may prompt the development and transmission of relevant information (e.g., government viewpoints), which may then impact public perceptions of prospective investment strategies (Shomron and Weiser 2019; Jin et al. 2020). Textual material (e.g., blogs, reviews, and status updates), online search queries (e.g., Google Trends), tags, and personal information are common forms of social media data. Social media data include individual views, ideas, and actions that affect stock market predictability and result in significant profits or losses (Bijl et al. 2016).

Textual data, particularly news, is a superior source of hidden information to quantitative data, because the former enables the forecasting of financial patterns with supporting evidence (Liang et al. 2020; Gu and Peng 2019). For instance, a news story about a corporation containing the terms “resignation” or “risk of default” leads the investor to anticipate a decrease in the stock price. In addition, stock market trends may be influenced by news pertaining to a variety of unforeseen events, such as terrorism, war, civil unrest, economic and political shocks, and natural disasters (Nassirtoussi et al. 2015). Similarly, Chen et al. (2014) demonstrated how information from user-generated research papers on SeekingAlpha may be utilized to anticipate earnings and stock returns. However, it is difficult for retail investors to comprehend the context of research papers completely.

Numerous social and economic effects may be forecast by Web search queries, which has attracted great interest. For example, Bijl et al. (2016) investigated the prospect of forecasting stock returns using Google Trends data and discovered that high Google search volumes are associated with negative returns. Kim et al. (2019) demonstrated that an increase in Google searches is predictive of a rise in the volatility and trading volume of the top companies listed on the Oslo Stock Exchange. Considering the national conditions of China, the Baidu index is an invaluable source for monitoring and forecasting Chinese socioeconomic activities.

According to behavioral finance theory, social network information may impact people’s financial decisions to some level. To help investors understand the connection between social networks and stock prices, Liu et al. (2021) constructed daily social networks utilizing information obtained from EastyMoney, the largest social media site in China, about individuals and the stocks they followed. The empirical data indicate that the social network variable can greatly improve forecasting accuracy. Zhang et al. (2018) investigated characteristics relating to collective mood and perception of stock relatedness based on messages from Xueqiu, a well-known Chinese social network similar to Twitter that caters to investors, and uses nonlinear models to anticipate stock price changes. However, both EastyMoney and Xueqiu are focused sites that cater to niche audiences, ignoring the hotspots and opinions from public media.

Table 1 briefly discusses previous literature, highlighting the limitations of previous studies. First, sentiment analysis of textual data has rarely been performed jointly across multiple platforms, perspectives, and participants. Second, TCN, although applied to stock forecasting, does not combine interval estimation with point forecasting to consider the situation. Third, the double insurance trading strategy of joint point and interval forecasting has not been studied. Therefore, this study proposes a comprehensive and integrated forecasting framework and trading strategy to fill the research gaps.

Table 1 A brief list of selected studies

An interval constraint-based trading strategy with social sentiment for the stock market

Abstract

Introduction

Literature review

Forecasting models for stock price

Social media and stock price forecasting

Methodology

Temporal convolutional network

Gi-MLP

Benchmark models and parameter setting

The framework

Empirical analysis

Data collection

Data processing

Evaluation criteria

Forecasting results

Trading simulations

Robustness analysis

Conclusion

Discussion and prospects

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords