Authors: Rohith Urs G, Nithin D, Akul G Devali, Rakshit Vastrad
Certificate: View Certificate
Accounting for price fluctuations and understanding people\'s emotions can help to improve stock price forecasting. Only a few models can decipher financial jargon and have stock price change datasets that have been labelled. In this project, we used text mining techniques to extract high-quality data from news and tweets published by legitimate businesses on the internet, allowing us to analyse, decide, and update our database for future use. In this paper, we propose an information gathering and processing framework that combines a natural language processing tool with our algorithms. We use natural language processing and machine learning techniques to make predictions. The result demonstrates the algorithm\'s ability to foresee favorable outcomes.
The stock market has grown to be one of the most important components of the economy not only in developed countries, but also in developing and third-world countries. Almost all large scale and mid-level companies have their stocks listed on the stock market. Added to this, along with traditional factors such as infrastructure, number of employees, expansion over the years etc., the health of the company stock and the profit it has made to investors has become a trademark factor in the assessment of the overall growth and development of any country. These trends are extensively followed and have begun to have increasingly bigger impacts on the lives of common people. The economic recession of 1929 and the real estate market crash of 2008 which cost millions of people their jobs and livelihoods is a startling proof of this fact. Hence it is the need of the hour to predict stock movements which will enable us to maximise our benefits, understand individual risk apatite and make safe decisions while investing in the stock market.
The numerous factors that influence each decision we make it difficult to make a stock market decision. As a result, in order to make the best stock market move, extensive analysis is required, which may include price trend, market nature, company stability, various stock news and so on. While studies and methods developed earlier were adequate for analysing the financial charts and patterns we see a new trend emerging in this decade. The movement of stock prices is increasingly being determined by traditional factors but also the talk and rumours about these stocks. This can be attributed to the decentralisation of the media and it’s division into orthodox sources such as newspapers, television channels on one hand and various organisations on the other, many of which exist online only. Many people get their news about the world around them, which invariably include stocks from social media posts and trends on popular apps such as Twitter and Instagram.
The goal of this research is to extract fundamental data from relevant news sources and analyse and, in some cases, forecast the stock market from the standpoint of the average investor. In order to provide a more complete and comprehensive perspective of the market to the user we have developed a system that caters not only to the financial charts but also the chatter and sentiments of people regarding a particular stock. Based on a review of existing business text mining research, we developed a framework that combines our text parser and analyser algorithm with a natural language processing tool open source, which converts texts to numeric data with the help of labelling in order to analyse, retrieve, and forecast stock market investment decisions from any text data source using machine learning and text mining.
A. Naïve Bayes
It is a supervised machine learning algorithm based on the Bayes theorem and assumes that feature pairs are independent. It is presumptively based on the assumption that none of the variables in the dataset are correlated with one another, but rather Nave. It is the most user-friendly, and it works well for large datasets and datasets with categorical data. Furthermore, when compared to other traditional models, the training time for this model is extremely short.
B. Random Forest Classifier
It is an ensemble learning method based on multiple individual decision trees which are created at training time. This means it takes individual tree prediction results into account when determining the outcome, resulting in improved performance. This is superior to the prediction result of any single tree. For tasks which involve classification, the Random Forest Classifier returns the class which has been selected by most of the trees as the result. However in the tasks which involve regression, the mean of the classes is found, which is then returned as the result.
C. Gradient Boosting Repressor
Another Ensemble technique (boosting) is to create predictors in a sequential rather than independent manner. Each predictor learns from the mistakes of the previous predictor. As a result, obtaining accurate predictions takes less time and iterations.
D. Logistic Regression
When there is a dependent (target) variable, a statistical model is used. To begin, linear regression is used to fit the data. Then, for predicting the probabilities of various classes of data, a logistic function is used. A sigmoid function is used to convert these probabilities to binary form, which aids in making actual predictions. In a low-dimensional dataset, it is less prone to overfitting the model.
eXtreme Gradient Boosting is a machine learning algorithm that is supervised. It is built with decision trees and employs a gradient boosting framework. It is also effective with tabular or structured data. Because of system optimization and algorithmic enhancements, XGBoost provides good performance.
III. LITERATURE REVIEW
Finally, rather than sentiments, to categorize the messages, the change in stock prices was used as a labelling technique. Currently, only a few experiments on this labelling technique are being carried out. Comparing the Percentage change technique with two labels to the other labelling techniques tested, this technique produced the best results in all models. The FinALBERT model is affected by hyperparameter settings and dataset size. Despite being pre-trained on a much smaller dataset, the model performed well when compared to the other models. For FinALBERT models to perform well, a large dataset with numerous training steps must be pre-trained, which was not possible due to hardware constraints. When compared to traditional models, the training time for the transformer-based model was excessively long.
 Rakhi Batra Department of Computer Science Sukkur IBA University firstname.lastname@example.org Sher Muhammad Daudpota Department of Computer Science Sukkur IBA University email@example.com Integrating StockTwits with Sentiment Analysis for better Prediction of Stock Price Movement.  Alya Al Nasseri , Allan Tucker, and Sergio de Cesare Big Data Analysis of StockTwits to Predict Sentiments in the Stock Market.  Sunil Kumar Khatri, Ayush Srivastava Amity Institute of Information Technology Amity University Uttar Pradesh, Noida, India firstname.lastname@example.org, email@example.com firstname.lastname@example.org Using Sentimental Analysis in Prediction of Stock Market Investment.  Scott Coyne, Praveen Madiraju and Joseph Coelho Department of Mathematics, Statistics and Computer Science Marquette University Milwaukee, WI, USA Forecasting Stock Prices using Social Media Analysis.  Liang Zhang, Keli Xiao, Hengshu Zhu, Chuanren Liu, Jingyuan Yang, Bo Jin CADEN: A Context-Aware Deep Embedding Network for Financial Opinions Mining.  Joseph Coelho, dawson d’Almeida, Scott Coyne, Nathan Gilkerson, katelyn Mills, Praveen Madiraju Social Media and Forecasting Stock Price Change.  Traianos-Ioannis Theodorou, Alexdros Zamichos, Michalis Skoumperdis, Anna Kougioumtzidou, kalliopi Tsolaki, Dimitris Papadopoulos, Thanasis Patsios, George Papanikalaou, Athanasios Konstantinids, Anastasios Drosou and Dimitrios Tzovaras An AI-Enabled Stock Prediction Platform Combining News and Social Sensing with Financial Statements.  Deepak Sharma^1, Bijendra Kumar^1, Satish Chand^2 1 Department of Computer Science Engineering, Netaji Subhash Institute of Technology, New Delhi, India 2. School of Computer And Systems Sciences, Jawaharlal Nehru University, New Delhi, India Trend Analysis in Machine Learning Research Using Text Mining.  Abhishek Kaushik and Sudhanshu Naithani Kiel university of Applied Science Kurukshetra University A Comprehensive Study of Text Mining Approach.  Mukul Jaggi *, Priyanka Mandal, Shreya Narang, Usman Naseem and Matloob Khushi Text Mining of Stocktwits Data for Predicting Stock Prices.  Qasem A. Al-Radaideh, Adel Abu Assaf, Eman Alnagi, email@example.com,firstname.lastname@example.org,ealnagi@philadelphi a.edu.jo Predicting stock prices using data mining techniques.  Murtaza Roondiwala, Harshal Patel, Shraddha Varma Predicting Stock Prices Using LSTM.
Copyright © 2022 Rohith Urs G, Nithin D, Akul G Devali, Rakshit Vastrad. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.