In an era marked by the rapid advancement of digital finance, individuals, freelancers, and small business owners are increasingly reliant on electronic documentation for tracking and managing their financial transactions. Among the most ubiquitous of these documents are bank e-statements, typically issued monthly in the Portable Document Format (PDF). While these digital statements provide a convenient and standardized way of recording financial activity, extracting meaningful insights—such as total debit amounts—often requires tedious manual inspection. For users without access to advanced financial management tools or accounting software, reconciling transactions from such statements can be time-consuming and error-prone. This paper introduces the design and development of a lightweight, web-based application that automates the extraction and summation of debit transactions from bank statements provided in PDF format. Built entirely in Python, the tool leverages three primary open-source libraries: Streamlit for building an interactive and user-friendly interface, pdfplumber for parsing and extracting tabular data from PDF files, and pandas for efficient data manipulation and analysis. The combination of these libraries results in a streamlined, accessible application that requires no programming expertise on the part of the end user. The core functionality of the application revolves around the automatic identification of columns typically labeled “Debit” or “Withdrawals.” Upon uploading a password-free bank e-statement, the application parses each page, extracts tables, and converts them into pandas DataFrames. It then scans these tables to detect column headers or data fields related to debit transactions. Only those rows containing valid debit entries—based on keyword recognition and numerical value validation—are retained. The application then computes the cumulative sum of the debit values and displays both the filtered transactions and the total debited amount in a clear and concise format. The motivation behind this solution is rooted in enhancing financial transparency and simplifying daily bookkeeping practices. Whether for tracking personal expenses or preparing monthly ledgers for small enterprises, this tool provides an accessible and cost-effective alternative to complex accounting software. Furthermore, by automating the extraction and calculation process, the risk of human error is significantly reduced, and users gain valuable time that would otherwise be spent combing through pages of transactional data. Another notable feature of this application is its adaptability. Given that bank statement formats can vary significantly between institutions, the implementation includes flexible parsing logic that searches for a variety of keyword patterns associated with debit activity. This ensures that the application remains robust across different formats and layouts, provided that the statements are text-based and contain recognizable table structures.
Introduction
Modern banking has shifted from paper statements to digital PDF e-statements, offering convenience but posing challenges for users who want to quickly analyze transactions, especially total debits. PDF statements vary widely in layout and are difficult to parse automatically, making manual review time-consuming and error-prone. Users—from individuals to small businesses—need simple, affordable tools to extract and sum debit transactions without complex software or manual preprocessing.
This research presents a lightweight, open-source Python web application that automates debit extraction from bank statement PDFs. It uses:
pdfplumber to parse tables from digital PDFs without OCR,
pandas to clean, filter, and aggregate transaction data,
Streamlit to create an easy drag-and-drop web interface displaying filtered debit entries and total sums.
The app intelligently identifies debit-related columns despite formatting differences, validating and summing monetary values. It targets personal users, freelancers, and small businesses, enhancing financial transparency by providing detailed expense visibility and enabling informed cash-flow management.
The solution is modular and extensible, with future plans to support OCR for scanned PDFs, multi-currency handling, and secure cloud deployment. Overall, it demonstrates that accessible, automated financial analysis tools can be built using open-source Python libraries to simplify recurring accounting tasks and empower users with actionable insights.
Conclusion
This study illustrates the feasibility of constructing an efficient, lightweight, and user-friendly web application leveraging the Python ecosystem and open-source libraries to automate the extraction and computation of debit transactions from PDF bank statements. By integrating pdfplumber for precise table parsing, pandas for robust data manipulation, and Streamlit for rapid deployment of an interactive interface, the application eliminates manual spreadsheet entry and accelerates financial review workflows. Users benefit from immediate visibility into their spending patterns, as the tool not only calculates the aggregate of debit and withdrawal entries but also displays each relevant transaction, thereby enhancing transparency and auditability. Moreover, the modular architecture supports future extensibility: incorporating OCR engines such as Tesseract would enable processing of scanned or image-based PDFs; embedding secure password prompts can accommodate encrypted statements; and developing RESTful APIs could facilitate seamless integration with existing accounting platforms. Additional enhancements—such as interactive data visualizations, export-to-Excel functionality, and multi-currency support—would further broaden the tool’s applicability to diverse financial contexts. In sum, this web application represents a foundational step toward a comprehensive, automated financial management solution, providing both immediate utility for end users and a flexible framework for ongoing innovation.
References
[1] Streamlit Documentation. (https://docs.streamlit.io)
[2] pdfplumber GitHub Repository. (https://github.com/jsvine/pdfplumber)
[3] pandas Documentation. (https://pandas.pydata.org/docs/)
[4] Chejkova-Nikolov, R., Gusev, M., Kostoska, M., & Ristov, S. (2015, May). Interoperablity of bank statements: A case study. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1505-1510). IEEE.
[5] Trivedi, A., Mukherjee, S., Singh, R. K., Agarwal, V., Ramakrishnan, S., & Bhatt, H. S. (2024). TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements. arXiv preprint arXiv:2412.12827.
[6] Wong, K., & Hanne, T. (2025). Support of Accounting by Bank Statement Classification Using Neural Networks. Journal of Emerging Technologies in Accounting, 22(1), 137-152.
[7] Proto, A. (2023). Current Accounts as a Tool for Bank and Customer Relationships Management. In Banking Transactions and Services (Vol. 1, pp. 17-28). Giappichelli Editore.
[8] Xu, L., Fan, W., Sun, J., Li, X., & Naoi, S. (2016, September). A knowledge-based table recognition method for Chinese bank statement images. In 2016 IEEE International Conference on Image Processing (ICIP) (pp. 3279-3283). IEEE.
[9] Lecci, M., & Hanne, T. (2025). Accounting Support Using Artificial Intelligence for Bank Statement Classification. Computers, 14(5), 193.
[10] Riba, P., Goldmann, L., Terrades, O. R., Rusticus, D., Fornés, A., & Lladós, J. (2022). Table detection in business document images by message passing networks. Pattern Recognition, 127, 108641.
[11] Jain, J. (2024). AI-Driven Optical Character Recognition for Fraud Detection in FinTech Income Verification Systems.
[12] Arinta, Y. N., Rahman, T., & Khilmiyah, I. (2024, September). DETECTION FINANCIAL STATEMENT FRAUD SHARIAH BANK IN INDONESIA: ROLE OF POLITIC CONNECTION, FINANCIAL STABILITY, IN EFECTIVE MONITORING. In Proceedings of the International Conference of Islamic Economics and Business (ICONIES) (Vol. 10, No. 1, pp. 1317-1328).
[13] Nemati, Z., Mohammadi, A., Bayat, A., & Mirzaei, A. (2025). Fraud Prediction in Financial Statements through Comparative Analysis of Data Mining Methods. International Journal of Finance & Managerial Accounting, 10(38), 151-166.
[14] Jindal, G. (2024). The role of finance tech in revolutionizing traditional banking systems through data science and AI. Journal Of Applied Sciences, 4(11), 10-21.