Financialtimeseriesforecastingremainsaformidablechallengeduetoinherentnon-stationarity,volatility,andcom-plex multi-scale temporal dependencies.This paper presents a deep hybrid neural network architecture that integrates stacked Long Short-Term Memory (LSTM) networks with high-capacity multi-head self-attention for multi-scale financial prediction. Unlike conventional approaches that rely on hand-engineered technical indicators or explicit sinusoidal positional encoding, our proposedmodelprocessesrawOpen-High-Low-Close-Volume(OHLCV)dataaugmentedwithunixtimestampsacrossfourtem-poral resolutions (15-minute, 30-minute, 60-minute, and daily intervals).The architecture employs 256 attention heads with extensivecross-scaleattentionmechanismstomodelinteractionsbetweentemporalgranularities,followedbydeepdenseprojec-tion layersforfinalpriceprediction.ExperimentalevaluationonHDFCBankstockdatafrom2008to2025demonstratesstrongperformance with validation Mean Absolute Error (MAE) of approximately 6.71 and Mean Absolute Percentage Error (MAPE) of0.69%. Theproposed116-million-parametermodelachievesrobustgeneralizationwithoutrequiringexplicitpositionalencod-ing or traditional technical indicators, offering a streamlined end-to-end approach to financial forecasting.Our implementation utilizes PySpark for scalable data preprocessing and TensorFlow for model training, with Huber loss optimization for enhanced robustness against market outliers.
Introduction
This research presents a deep multi-scale LSTM–Transformer model for stock price forecasting using only raw market data (Open, High, Low, Close, Volume, and Unix timestamps), eliminating the need for traditional technical indicators and explicit positional encoding. Financial forecasting is challenging due to market volatility and non-linear behavior, but deep learning models such as LSTMs and Transformers have shown strong potential in capturing temporal dependencies and long-range relationships.
The proposed architecture combines stacked LSTM layers for implicit temporal encoding with 256-head self-attention Transformers for modeling complex dependencies across multiple time scales (15-minute, 30-minute, 60-minute, and daily data). Cross-scale attention mechanisms allow information sharing between different temporal resolutions, improving forecasting accuracy. The model contains approximately 115.9 million parameters and is trained using the Adam optimizer with AMSGrad, Huber loss, dropout regularization, and gradient clipping.
Experiments were conducted on 17 years of HDFC Bank stock data (2008–2025). The proposed model achieved a validation MAPE of 0.69%, outperforming traditional approaches such as ARIMA and standard LSTM-based models. Ablation studies confirmed that cross-scale attention and multi-timeframe inputs significantly improve performance, while technical indicators and sinusoidal positional encodings provide only marginal benefits despite increasing complexity.
The study demonstrates that large-capacity deep learning architectures can automatically learn useful market representations directly from raw data, reducing reliance on hand-engineered features. Results suggest that implicit temporal encoding through LSTMs combined with Unix timestamps is sufficient for capturing temporal information, while cross-scale attention effectively models interactions across different market time horizons. Although the model delivers high prediction accuracy, its large size may require optimization techniques such as model compression or quantization for real-world deployment.
Conclusion
Thispaperpresentedadeepmulti-scaleLSTM-Transformerarchitectureforfinancialtimeseriesforecastingthateliminatesboth explicit positional encoding and traditional technical indicators. Our key findings include:
1) AcombinationofLSTMrecurrentprocessingandrawunixtimestampsprovidessufficienttemporalencoding,eliminating the need for explicit sinusoidal positional encoding schemes.
2) Deeparchitectureswith116MparameterscanlearneffectiverepresentationsdirectlyfromrawOHLCVdatawithouthand-engineered technical indicators.
3) Extensivecross-scaleattentionmechanismswith256headssubstantiallyenhancepredictionaccuracybymodelinginterac-tions between different temporal resolutions.
4) The proposed architecture achieves validation MAE of 6.71 and MAPE of 0.69% on HDFC Bank stock prediction, repre-senting significant improvement over indicator-based and positionally-encoded baselines.
Future work will explore: (1) knowledge distillation for model compression, (2) integration of alternative data sources (sen-timent, order book), (3) adversarial training for robustness against market regime changes, and (4) interpretability analysis of attention patterns across temporal scales.
References
[1] E.F.Fama,“Randomwalksinstockmarketprices,”FinancialAnalystsJournal,vol.21,no.5,pp.55–59,1965.
[2] E.F.Fama,“Efficientcapitalmarkets:Areviewoftheoryandempiricalwork,”JournalofFinance,vol.25,no.2,pp. 383–417, 1970.
[3] R.S.Tsay,AnalysisofFinancialTimeSeries,3rded.Hoboken,NJ:Wiley,2010.
[4] R.Cont,“Empiricalpropertiesofassetreturns: Stylizedfactsandstatisticalissues,”QuantitativeFinance,vol.1,no.2,pp. 223–236, 2001.
[5] G.E.P.Boxetal.,TimeSeriesAnalysis:ForecastingandControl,5thed.Hoboken,NJ:Wiley,2015.
[6] R. F. Engle, “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,” Econometrica, vol. 50, no. 4, pp. 987–1007, 1982.
[7] G. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp.159–175, 2003.
[8] S.-H. Poon and C. W. J. Granger, “Forecasting volatility in financial markets:A review,” Journal of Economic Literature, vol. 41, no. 2, pp. 478–539, 2003.
[9] S.HochreiterandJ.Schmidhuber,“Longshort-termmemory,”NeuralComputation,vol.9,no.8,pp.1735–1780,1997.
[10] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
[11] F.A.Gers,J.Schmidhuber,andF.Cummins,“Learningtoforget: ContinualpredictionwithLSTM,”NeuralComputation, vol. 12, no. 10, pp. 2451–2471, 2000.
[12] A.Graves,“Generatingsequenceswithrecurrentneuralnetworks,”arXivpreprintarXiv:1308.0850,2013.
[13] A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 5998–6008.
[14] M.R.Kabiretal.,“LSTM–Transformer-basedrobusthybriddeeplearningmodelforfinancialtimeseriesforecasting,”Sci, vol. 7, no. 1, p. 7, 2025.
[15] S.Zheng, “AhybridLSTM-Transformerapproachforfinancialmarkets,” FrontiersinComputingandIntelligentSystems, vol. 5, no. 1, pp. 45–52, 2024.
[16] Y. Li et al.,“Accurate stock price forecasting based on deep learning and hierarchical frequency decomposition,”IEEEAccess, vol. 12, pp. 49,878–49,894, 2024.
[17] K.C.Bandhuetal., “Animprovedtechniqueforstockpriceprediction onreal-timeexploitingstreamprocessinganddeep learning,” Multimedia Tools and Applications, vol. 83, pp. 57,269–57,289, 2023.
[18] Y. Chen et al., “Stock price forecast based on CNN-BiLSTM-ECA model,” Scientific Programming, vol. 2021, Article ID 2446543, 2021.
[19] J. Wang et al., “An enhanced interval-valued decomposition integration model for stock price prediction,” Expert Systems with Applications, vol. 243, p. 122,891, 2023.
[20] W.Baoetal.,“Adeeplearningframeworkforfinancialtimeseriesusingstackedautoencodersandlong-shorttermmemory,” PLOS ONE, vol. 12, no. 7, p. e0180944, 2017.
[21] T.FischerandC.Krauss, “Deeplearningwithlongshort-termmemorynetworksforfinancialmarketpredictions,”European
[22] J.OperationalResearch,vol.270,no.2,pp.654–669,2018.
[23] D. M. Q. Nelson et al., “Stock market’s price movement prediction with LSTM neural networks,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2017, pp. 1419–1426.
[24] H.Y.KimandC.H.Won,“Forecastingthevolatilityofstockpriceindex: AhybridmodelintegratingLSTMwithmultiple GARCH-type models,” Expert Systems with Applications, vol. 103, pp. 25–37, 2018.
[25] O.B.Sezeretal.,“Financialtimeseriesforecastingwithdeeplearning: Asystematicliteraturereview: 2005–2019,”IEEE Access, vol. 8, pp. 58,254–58,275, 2020.
[26] W.Chenetal.,“Applicationofdeeplearningtoalgorithmictrading,”StanfordCS229ProjectReport,2016.
[27] W.Longetal.,“Deeplearning-basedfeatureengineeringforstockpricemovementprediction,”Knowledge-BasedSystems, vol. 164, pp. 163–173, 2018.
[28] H.Rezaeietal., “AhybridCNN-LSTMmodelforstockpriceprediction,” inProc.IEEEInt.Conf.IndustrialEngineering and Engineering Management (IEEM), 2020, pp. 1127–1131.
[29] B. Qian et al., “Stock price prediction based on attention mechanism and LSTM neural network,” IEEE Access, vol. 8, pp. 196,526–196,538, 2020.
[30] J.Shengetal.,“Attention-basedCNN-LSTMforfinancialtimeseriesprediction,”Neurocomputing,vol.447,pp.140–151, 2021.
[31] Q.Wenetal.,“Transformersintimeseries:Asurvey,”arXivpreprintarXiv:2202.07125,2022.
[32] H. Zhou et al., “Informer:Beyond efficient transformer for long sequence time-series forecasting,” in Proc. AAAI Conf.Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11,106–11,115.
[33] H.Wuetal.,“Autoformer: Decompositiontransformerswithauto-correlationforlong-termseriesforecasting,”inAdvances in Neural Information Processing Systems, vol. 34, 2021, pp. 22,419–22,430.
[34] Y. Liu et al., “iTransformer:Inverted transformers are effective for time series forecasting,” in Proc. Int. Conf. Learning Representations (ICLR), 2023.
[35] Y.Nieetal.,“Atimeseriesisworth64words:Long-termforecastingwithtransformers,”inProc.Int.Conf.Learning Representations (ICLR), 2022.
[36] J.J.Murphy,TechnicalAnalysisoftheFinancialMarkets.NewYork:Penguin,1999.
[37] A. W. Lo,“The adaptive markets hypothesis:Market efficiency from an evolutionary perspective,”Journal of Portfolio Management, vol. 30, no. 5, pp. 15–29, 2004.
[38] Y. Bengio, A. Courville, and P. Vincent, “Representation learning:A review and new perspectives,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[39] Y.LeCun,Y.Bengio,andG.Hinton,“Deeplearning,”Nature,vol.521,no.7553,pp.436–444,2015.
[40] J. Gehring et al., “Convolutional sequence to sequence learning,” in Proc. Int. Conf. Machine Learning (ICML), 2017, pp. 1243–1252.
[41] S.J.Reddietal.,“OntheconvergenceofAdamandbeyond,”inProc.Int.Conf.LearningRepresentations(ICLR),2019.
[42] P.J.Huber,“Robustestimationofalocationparameter,”AnnalsofMathematicalStatistics,vol.35,no.1,pp.73–101,1964.
[43] N. Srivastava et al., “Dropout:A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[44] R.Pascanu,T.Mikolov,andY.Bengio,“Onthedifficultyoftrainingrecurrentneuralnetworks,”inProc.Int.Conf.Machine Learning (ICML), 2013, pp. 1310–1318.
[45] R.F.EngleandG.G.J.Lee,“Along-runandshort-runcomponentmodelofstockreturnvolatility,”inCointegration, Causality, and Forecasting, 1998, pp. 475–497.
[46] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Ad-vances in Neural Information Processing Systems, vol. 25, 2012, pp. 1097–1105.
[47] K.Heetal.,“Deepresiduallearningforimagerecognition,”inProc.IEEEConf.ComputerVisionandPatternRecognition (CVPR), 2016, pp. 770–778.
[48] G.Hinton,O.Vinyals,andJ.Dean,“Distillingtheknowledgeinaneuralnetwork,”arXivpreprintarXiv:1503.02531,2015.
[49] J.Deanetal.,“Largescaledistributeddeepnetworks,”inAdvancesinNeuralInformationProcessingSystems, vol.25,2012,
[50] pp.1223–1231.
[51] M. Li et al., “Scaling distributed machine learning with the parameter server,” in Proc. USENIX Symp. Operating Systems Design and Implementation (OSDI), 2014, pp. 583–598.