Personal finance management has long remained reactive, manual, and devoid of intelligent forward-looking capability. Existing commercial platforms such as Mint, YNAB, and Money Manager function as categorised ledgers, recording past transactions without any predictive or diagnostic intelligence. This paper presents FinFusion, a full-stack AI-powered personal finance tracker that addresses three fundamental dimensions of the problem: prediction, diagnosis, and interaction. The prediction dimension is served by a two-layer stacked Long Short-Term Memory (LSTM) neural network trained on 11 engineered temporal features to forecast daily expenditure over a 30-day horizon. The diagnostic dimension is served by a global z-score anomaly detection pipeline that flags statistically extreme transactions and generates template-based natural language suggestions. The interaction dimension is served by two low-friction data capture modalities — an OCR-based receipt scanner built on pytesseract and OpenCV, and a browser-native voice command interface using the W3C Web Speech API. Additionally, FinFusion incorporates a group expense management module enabling shared expense splitting, net balance ledger maintenance, and simplified settlement computation. The system is trained on a real dataset of 12,030 cleaned expense transactions spanning 2021–2024, and deployed as a FastAPI backend paired with a React single-page application. Empirical evaluation demonstrates robust forecasting across varied spending histories, 422 anomaly candidates detected at a conservative 2.5? threshold, and OCR parsing achieving 81% confidence on real receipts.
Introduction
This study presents FinFusion, an AI-powered personal finance management platform designed to overcome the limitations of traditional expense-tracking applications. While digital payments generate large amounts of financial data, most existing finance tools only record and visualize past transactions without offering intelligent insights. FinFusion addresses this gap by integrating expenditure forecasting, anomaly detection, OCR-based receipt scanning, voice interaction, and group expense management into a single platform.
The system is motivated by three major challenges: predicting future spending patterns, identifying unusual transactions, and reducing the burden of manual data entry. To solve these problems, FinFusion employs a stacked Long Short-Term Memory (LSTM) neural network for 30-day expenditure forecasting, a z-score-based anomaly detection model for identifying abnormal spending behavior, an OCR pipeline using OpenCV and Tesseract for extracting transaction details from receipts, and a voice interface based on the W3C Web Speech API for hands-free interaction.
A review of existing literature highlights the effectiveness of LSTM networks in financial forecasting, the usefulness of statistical methods for anomaly detection, and the growing adoption of OCR and speech technologies in user-centric applications. Compared with conventional finance tools such as Mint, YNAB, Walnut, and Money Manager, FinFusion uniquely combines forecasting, anomaly detection, OCR input, voice commands, and collaborative expense management.
The platform follows a modular full-stack architecture consisting of a React-based frontend, FastAPI backend, machine learning modules, and a SQLAlchemy-SQLite database. Transactions can be entered manually, through receipt scanning, or via voice commands. The analytics layer transforms daily spending records into an 11-feature dataset containing rolling averages, volatility measures, cyclical calendar features, and spending indicators. These engineered features are used by a stacked LSTM model to generate future expenditure forecasts and confidence scores. Simultaneously, anomaly detection algorithms identify unusual spending patterns and generate explainable insights.
Additional modules include adaptive budgeting, OCR-based receipt processing, JWT-secured authentication, interactive dashboards, and a Splitwise-style group expense management system. Group expenses are maintained separately to prevent distortion of personal financial analytics. The platform also integrates cloud storage and notification services for receipt management and alerts.
Conclusion
This paper presented FinFusion, a full-stack AI-powered personal finance management platform that extends beyond the reactive ledger-based approach of traditional finance applications. By integrating LSTM-based expenditure forecasting, adaptive budgeting, anomaly detection, OCR-based receipt parsing, browser-native voice interaction, explainable AI insights, and Splitwise-style collaborative expense management, the system addresses the prediction, diagnostic, interaction, and collaborative dimensions of modern personal finance management within a unified architecture.
The platform was evaluated on a real multi-year dataset containing 12,030 cleaned expense transactions spanning four calendar years. Experimental results demonstrated stable 30-day expenditure forecasting with confidence-based prediction analysis, effective anomaly detection using z-score and Isolation Forest techniques, and reliable OCR-based transaction extraction with an observed confidence score of 81% on real retail receipts. The adaptive budgeting engine successfully generated proactive spending recommendations and category-level budget pressure analysis, while the voice interaction layer enabled lightweight browser-native speech control without external NLP dependencies.
The Splitwise-style collaborative finance module further extended the system by supporting equal, percentage-based, and custom expense splitting with dynamic settlement tracking and repayment optimisation. Importantly, collaborative transactions remained isolated from the personal forecasting and anomaly detection pipelines, preserving the accuracy of individual financial modelling.
Overall, FinFusion demonstrates that intelligent personal finance management can be achieved through the integration of machine learning, explainable analytics, multimodal interaction, and collaborative financial tracking within a scalable full-stack architecture. The proposed system transforms traditional expense tracking into a proactive, interpretable, and user-centric financial assistance platform suitable for real-world deployment.
References
[1] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[2] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in Proc. IEEE International Conference on Data Mining (ICDM), 2008, pp. 413–422.
[3] R. Smith, “An Overview of the Tesseract OCR Engine,” in Proc. International Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629–633.
[4] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proc. International Conference on Learning Representations (ICLR), 2015.
[5] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014.
[6] T. Fischer and C. Krauss, “Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions,” European Journal of Operational Research, vol. 270, no. 2, pp. 654–669, 2018.
[7] W. Kong, Z. Y. Dong, D. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, 2019.
[8] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A Comparison of ARIMA and LSTM in Forecasting Time Series,” in Proc. IEEE International Conference on Machine Learning and Applications (ICMLA), 2018, pp. 1394–1401.
[9] R. Kaur and P. Singh, “LSTM-Based Monthly Spending Prediction Using Bank Statement Data,” International Journal of Engineering Research & Technology (IJERT), vol. 9, no. 6, pp. 1123–1128, 2020.
[10] A. R. Katti, C. Reisswig, C. Guder, S. Brarda, S. Bickel, J. Höhne, and J. Faddoul, “Chargrid: Towards Understanding 2D Documents,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018, pp. 4459–4469.
[11] F. Doshi-Velez and B. Kim, “Towards a Rigorous Science of Interpretable Machine Learning,” arXiv preprint arXiv:1702.08608, 2017.
[12] Z. C. Lipton, “The Mythos of Model Interpretability,” Communications of the ACM, vol. 61, no. 10, pp. 36–43, 2018.
[13] A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, vol. 6, pp. 52138–52160, 2018.
[14] W3C, “Web Speech API Specification — SpeechRecognition Interface,” 2012. [Online]. Available: https://www.w3.org/TR/speech-api/
[15] V. Vtyurina, J. Fourney, M. Ringel Morris, P. Findlater, and R. White, “Exploring the Role of Conversational Cues in Voice Search,” in Proc. ACM SIGIR Conference, 2020.
[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, pp. 436–444, 2015.
[17] T. Mikolov, M. Karafiát, L. Burget, J. ?ernocký, and S. Khudanpur, “Recurrent Neural Network Based Language Model,” in Proc. Interspeech, 2010, pp. 1045–1048.
[18] J. Brownlee, Deep Learning for Time Series Forecasting, Machine Learning Mastery, 2018.
[19] W. McKinney, “Data Structures for Statistical Computing in Python,” in Proc. Python in Science Conference (SciPy), 2010, pp. 56–61.
[20] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.