TheAutomated Expense Classifier is a smart finance management application that leverages Natural Language Processing (NLP) and Machine Learning algorithms to automatically categorize expenses from bank statements in PDF/CSV formats. The project uses a TF-IDF vectorizer for text preprocessing and a Logistic Regressionmodel(trainedonlabelledtransactiondata)foraccurateexpenseclassificationintocategoriessuchas food, travel, shopping, and utilities. Data visualization techniques including matplotlib and Streamlit charts are usedtogeneratepiechartsandmonthlytrendanalysisforbetterfinancialinsights.Toensurefinancialdiscipline, the system integrates budget alert logic with real-time email notificationswhen the spending exceedspredefined limits. The project further incorporatesAI-powered forecasting using time-series trend analysis to predict next month’stopspendingcategoriesandarecommendationenginethatprovidespersonalizedsuggestionstooptimize savings. An AI Chatbot Assistant, built using Lang Chain and OpenAI, is integrated for interactive financial guidance, while the frontend features a modern animated gradient background theme for enhanced user experience. Overall,thisprojectcombinesdatapreprocessing,machinelearning,NLP,visualization,forecasting,andchatbottechnologiestodeliverarobust,intelligent,anduser-friendlypersonalexpensemanagementsolution.
Introduction
The Automated Expense Classifier is an AI-powered system designed to simplify personal financial management in today’s digital era, where individuals rely heavily on online banking, credit cards, and mobile payments. Traditional expense-tracking methods like spreadsheets are slow, error-prone, and offer limited insights. To address these issues, the system uses Machine Learning (ML), Natural Language Processing (NLP), and predictive analytics to automatically extract, categorize, analyze, and forecast financial transactions.
The system uploads bank statements in PDF/CSV formats, extracts transaction details through OCR and parsing tools, cleans the data, and classifies expenses into categories such as Food, Travel, Utilities, Shopping, and Entertainment using a Logistic Regression model with TF-IDF vectorization. It also generates visual insights through pie charts, monthly trend graphs, and statistical summaries, helping users understand spending patterns easily.
Additional features include budget alerts (real-time email notifications when spending exceeds limits), AI-based forecasting of future expenses using time-series analysis, and a recommendation engine that provides personalized savings and budgeting suggestions. An interactive chatbot built with LangChain and OpenAI enhances accessibility by answering queries and offering financial guidance. The system interface uses a modern animated theme for better user engagement.
The literature survey highlights significant advancements in AI-driven finance solutions, including automated classification, predictive modeling, NLP-based text analysis, OCR-enabled extraction, and recommendation systems. Studies demonstrate the effectiveness of ML algorithms, hybrid models, and AI-powered insights in improving personal finance management.
The proposed system includes five modules:
Data Input & Extraction – Uploading and parsing PDF/CSV statements.
Evaluation results show strong performance: 87.6% accuracy, low test loss (0.42), and high precision, recall, and F1-scores across all categories, indicating reliable classification. The confusion matrix further demonstrates the model’s ability to correctly identify and distinguish different spending categories.
Overall, the Automated Expense Classifier provides a comprehensive, intelligent, and user-friendly platform that automates expense tracking, improves budgeting accuracy, and supports informed financial decision-making. It demonstrates the real-world potential of integrating ML, NLP, visualization, and AI assistants in modern personal finance management.
Conclusion
TheAutomatedExpenseClassifiersuccessfully classifiesfinancialtransactionsintomeaningful categories using a machine learning model trained on textual transaction descriptions. By leveragingNaturalLanguageProcessing(NLP) andsupervisedlearningtechniques,thesystem efficiently processes CSV or PDF bank statements and automatically categorizes expenses into predefined labels such as Food, Travel, Bills, Shopping, and Others.
The achieved accuracy of 87.6% and balanced precision,recall,andF1-scoresindicatethatthe modelgeneralizeswelltounseenfinancialdata. The integration of budget alerts, AI-powered recommendations, and expense visualization further enhances user experience, enabling better financial awareness and planning.
Future enhancements include expanding the dataset, fine-tuning the model using transformer-basedarchitectures(suchasBERT orDistilBERT),andimplementingreal-time expensetrackingforimprovedaccuracyand adaptability.
References
[1] S. Hochreiter and J. Schmidhuber, \"Long Short-Term Memory,\" Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[2] T. Mikolov, K. Chen, G. Corrado, and J. Dean, \"Efficient Estimation of Word Representations in Vector Space,\" Proc. Int. Conf. on Learning Representations (ICLR), 2013, doi: 10.48550/arXiv.1301.3781.
[3] F.Pedregosa,G.Varoquaux,A.Gramfort,et al.,\"Scikit-learn:MachineLearninginPython,\" JournalofMachineLearningResearch,vol.12,pp. 2825–2830, 2011, doi: 10.5555/1953048.2078195.
[4] S. Ruder, \"An Overview of Gradient Descent Optimization Algorithms,\" arXiv preprint arXiv:1609.04747, 2016, doi: 10.48550/arXiv.1609.04747.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" Proc. NAACL-HLT, pp. 4171–4186, 2019, doi:10.48550/arXiv.1810.04805.
[6] Y.ZhangandX.Yang,\"FinancialDocument ClassificationUsingDeepLearningandNLP,\" IEEEAccess,vol.9,pp.110212–110223,2021, doi: 10.1109/ACCESS.2021.3103354.
[7] P. Aggarwal, \"Automated Expense ClassificationUsingNaturalLanguage Processing and Machine Learning,\" International Journal of Data Science and Applications, vol. 7, no. 2, pp. 44–51, 2021, doi: 10.5120/ijds2021072044.
[8] R. Kumar and V. Sharma, \"AI-Driven Personal Finance Management System for Smart Budgeting,\" International Journal of ComputerApplications,vol.183,no.5,pp.12–19,2022,doi: 10.5120/ijca2022183053.
[9] K.Clark,U.Khandelwal,O.Levy,andC.D. Manning, \"What Does BERT Look at? An Analysis of BERT’s Attention,\" Proc. 2019 ACL Workshop BlackboxNLP, pp. 276–286, 2019, doi: 10.18653/v1/W19-4828.
[10] D. A. Dopazo, A. P. Cobo, and J. M. Herrera, \"An Automated Machine Learning Approach for Classifying Financial Transactions,\" Computer-Aided Civil and Infrastructure Engineering, vol. 39, no. 2, pp. 291–304,Feb.2024,doi:10.1111/mice.13114.