A Hierarchical conv-LSTM and LLM Integrated Model for Holistic Stock Forecasting

Authors: Arya Chakraborty, Auhona Basu

DOI Link: https://doi.org/10.22214/ijraset.2025.68240

Abstract

The financial domain presents a complex environment for stock market prediction, characterized by volatile patterns and the influence of multifaceted data sources. Traditional models have leveraged either Convolutional Neural Networks (CNN) for spatial feature extraction or Long Short-Term Memory (LSTM) networks for capturing temporal dependencies, with limited integration of external textual data. This paper proposes a novel Two-Level Conv-LSTM Neural Network integrated with a Large Language Model (LLM) for comprehensive stock advising. The model harnesses the strengths of Conv-LSTM for analyzing time-series data and LLM for processing and understanding textual information from financial news, social media, and reports. In the first level, convolutional layers are employed to identify local patterns in historical stock prices and technical indicators, followed by LSTM layers to capture the temporal dynamics. The second level integrates the output with an LLM that analyzes sentiment and contextual information from textual data, providing a holistic view of market conditions. The combined approach aims to improve prediction accuracy and provide contextually rich stock advising.

Introduction

The text discusses how integrating spatial and temporal data enhances stock market analysis and forecasting. Spatial data relates to geographical factors affecting markets, such as regional economic conditions, political risks, and regulations. Temporal data tracks changes in stock prices and volumes over time, enabling trend and volatility analysis through time-series methods.

Combining spatial and temporal data—spatiotemporal analysis—provides deeper insights into how regional events and time trends influence stock behavior. Advanced neural networks like Conv-LSTM are effective for this, as they combine convolutional layers (for spatial features) with LSTM layers (for temporal dependencies), improving stock price prediction.

Additionally, incorporating Large Language Models (LLMs), such as BERT, allows analysis of unstructured textual data from news and social media to extract sentiment and contextual information. This spatial-textual input complements numerical data to refine forecasts, especially during unexpected market shifts.

The proposed hierarchical model first uses Conv-LSTM to predict stock trends from historical data. Simultaneously, LLMs perform sentiment analysis on related news articles, generating weighted sentiment scores based on source credibility. These are combined with Conv-LSTM predictions to fine-tune the forecast via a second LLM layer, producing more accurate, real-world-applicable stock price predictions.

The text also explains the internal workings of LSTM networks, detailing their gating mechanisms (forget, input, output gates) that manage long-term dependencies in sequential data. It outlines the LLM training workflow, including data collection, tokenization, pre-training, fine-tuning, and inference.

The document highlights the limitations of traditional LSTM-only forecasting that ignores spatial factors and emphasizes the benefits of integrating spatial (news, geography) and temporal data using Conv-LSTM and LLMs for robust stock market prediction.

Conclusion

The dataset utilized in this research is a custom dataset that consists of historical stock data over the past four years, combined with related news articles from the same time. The stock data includes daily metrics such as closing prices, trading volumes, opening prices, and adjusted closing prices, capturing the stock\'s performance across various market conditions. In parallel, news articles were gathered using the NEWS API, which aggregates content from over c150,000 sources, including major media outlets and niche financial publications. These articles focus on events and developments relevant to the stock, such as financial earnings, product launches, and broader economic trends. This comprehensive dataset allowed us to analyse both quantitative financial data and qualitative news sentiment to assess their combined impact on stock behaviour. The performance of the machine learning models was evaluated using several key error metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results demonstrate that the Hybrid Model (combining Convolutional LSTM and LLM) significantly outperformed the standalone Convolutional LSTM model across all metrics. The improvement in performance suggests a direct relationship of the stock’s performance with the news data that was provided to it. This implies that while using only historical data can already result in accurate predictions since the temporal trends are captured by the LSTM models, we can further enhance the accuracy of the model by incorporating the spatial data analysis related to the stock since it will help establish a relationship between the spatial features obtained during training and thus help in improving the overall accuracy of the model. In addition to stock price prediction, the hybrid approach of combining quantitative time-series data with qualitative contextual data, such as news sentiment, has broad potential applications in other fields. For instance, in the healthcare industry, predictive models could integrate historical patient data with medical literature or news articles on emerging treatments to forecast patient outcomes or disease trends more accurately. Similarly, in supply chain management, models could use historical inventory data alongside news reports on global logistics, economic policies, or environmental conditions to predict potential disruptions or optimize stock levels. The fusion of temporal and contextual information, as demonstrated in this research, opens new possibilities for making more informed and accurate predictions across a wide range of domains, where external factors play a critical role in determining outcomes. This approach not only enhances prediction accuracy but also provides more comprehensive insights for decision-makers in various industries.

References

[1] J. Zheng, W. Li, Q. Liu, and X. Wu, \"Learning Multiscale Temporal-Spatial-Spectral Features via a Multipath Convolutional LSTM Neural Network for Change Detection with Hyperspectral Images,\" arXiv preprint arXiv:2305.14378, 2023. [2] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, \"Dueling Network Architectures for Deep Reinforcement Learning,\" in Advances in Neural Information Processing Systems, 2015. [3] J. Selva and J. R. Valasek, \"Adaptive Control for Singularly Perturbed Systems,\" Sensors, vol. 19, no. 16, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/16/3576 [4] J. Zheng, W. Li, Q. Liu, and X. Wu, \"Learning Multiscale Temporal-Spatial-Spectral Features via a Multipath Convolutional LSTM Neural Network for Change Detection with Hyperspectral Images,\" ResearchGate, 2022. [Online]. Available: https://www.researchgate.net/publication/357747515 [5] T. Zhang and C. Wang, \"Learning Multiscale Temporal-Spatial-Spectral Features via a Multipath Convolutional LSTM Neural Network for Change Detection with Hyperspectral Images,\" Machines, vol. 10, no. 12, MDPI, 2022. [Online]. Available: https://www.mdpi.com/2075-1702/10/12/1226 [6] C. Olah, \"Understanding LSTMs,\" Colah’s blog, 2015. [Online]. Available: https://colah.github.io/posts/2015-08-Understanding-LSTMs [7] CallMeTwitch, \"Building a Neural Network Zoo from Scratch: The Long Short-Term Memory Network,\" Medium, 2022. [Online]. Available:https://medium.com/@CallMeTwitch/building-a-neural-network-zoo-from-scratch-the-long-short-term-memory-network-1cec5cf31b7 [8] Divyanshu, \"LSTM and its Equations,\" Medium, 2020. [Online]. Available: https://medium.com/@divyanshu132/lstm-and-its-equations-5ee9246d04af [9] Neuronio, \"An Introduction to ConvLSTM,\" Medium, 2021. [Online]. Available: https://medium.com/neuronio/an-introduction-to-convlstm-55c9025563a7 [10] Pluralsight Team, \"Introduction to LSTM Units in RNN,\" Pluralsight, 2019. [Online]. [11] Available:https://www.pluralsight.com/resources/blog/guides/introduction-to-lstm-units-in-rnn [12] M. Vyas, \"Understanding LSTM,\" Medium, 2021. [Online]. Available: https://medium.com/@maharishi92vyas/understanding-lstm-343b3ac135d EITCA Academy, \"What is the Purpose of the Cell State in LSTM?\" EITCA Academy, 2022. [Online]. Available: https://eitca.org/artificial-intelligence/eitc-ai-tff-tensorflow-fundamentals/natural-language-processing-with-tensorflow/long-short-term-memory-for-nlp/examination-review-long-short-term-memory-for-nlp [13] Analytics Vidhya, \"LSTMs Explained: A Complete, Technically Accurate Conceptual Guide with Keras,\" Medium, 2020. [Online]. Available: https://medium.com/analytics-vidhya/lstms-explained-a-complete-technically-accurate-conceptual-guide-with-keras-2a650327e8f2

Copyright

Copyright © 2025 Arya Chakraborty, Auhona Basu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68240

Publish Date : 2025-04-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here