Manual processing of employee conveyance claimsisinefficientanderror-prone,especiallywiththeincreasing use of ride-hailing services such as Ola and Uber. For faster reimbursement processing, this research proposes an automated system that extracts data from email bills, categorizes them by vehicletype,andupdatesacommonExcelsheet.Thesystemuses machinelearningtechniquesofclassificationandverification, as well as optical character recognition (OCR) through Python Tesseractfortextextraction.Thecodesofemployeesarematched through an admin verification procedure, and cases that fail to operateareflaggedforhumaninspection.Theproposedapproach reduces HR workload, increases accuracy, and significantly reduces processing time. For improved OCR accuracy, future innovationswillinvolvedeeplearningmodelsandroboticprocess automation (RPA).
Introduction
Organizations increasingly use ride-hailing platforms like Ola and Uber to fulfill employee transportation needs, generating large volumes of invoices that HR departments must manually process. This manual handling is time-consuming and error-prone, leading to inefficiencies and cost issues.
The study proposes an automated invoice processing system using machine learning (ML), optical character recognition (OCR), and regular expressions (regex) to efficiently extract, validate, and classify invoice data. The system retrieves invoices from HR email accounts, preprocesses the text to remove noise, extracts key fields (invoice number, date, amount, service provider), and structures the data into Excel sheets for easy HR review.
Key contributions include:
A scalable framework capable of processing large volumes of invoices with minimal human intervention.
Robust error-handling to ensure data accuracy.
High performance confirmed by tests: 97% accuracy and 4.2 seconds processing time per invoice.
The research covers challenges like varying invoice formats, noisy text, and image-based invoices, using advanced OCR techniques (Tesseract), NLP for field extraction, and ML classifiers (Logistic Regression, SVM, Random Forest) for categorization.
The system significantly reduces HR workload (by 95%), supports high scalability, and maintains data integrity through detailed error logging and handling. Future enhancements plan to add support for more vendors, real-time dashboards, AI-driven OCR improvements, and mobile invoice uploads.
Conclusion
Theimplementationofthisautomatedconveyanceprocessingsystemhasresultedinsignificantimprovementsinefficiency,accuracy,andscalability,effectivelyaddressing thechallengesassociatedwithmanualinvoicehandling.By achieving high accuracy rates, reducing processing times, and demonstrating the ability to handle large volumes of invoices efficiently, thesystemhasexceededperformanceexpectations. Furthermore,itsrobusterror-handlingmechanismsand adaptive architecture enable seamless integration into diverse operationalworkflows.Thesystemnotonlyreducesad-ministrativeworkload but also enhances compliance, record- keeping,andfinancialtransparency.
Looking ahead, planned enhancements—including AI- driven improvements, multiple invoice formats, will further enhance the system’s capabilities. As organizations continuetoseekadvancedautomationsolutions,thissystemprovides a scalable and intelligent approach to optimizing expense management.
References
[1] E. Larson, ”[Research Paper] Automatic Checking of Regular Expressions,” 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM), Madrid, Spain, 2018, pp. 225- 234
[2] Zhang, Jian & Cheng, Renhong & Wang, Kai &Zhao, Hong. Research on the Text Detection and Extraction from Complex Images. Proceedings - 4th International ConferenceonEmergingIntelligentDataandWebTech- nologies, 2013, EIDWT 2013. 708-713.
[3] C. Kaundilya, D. Chawla and Y. Chopra, ”Automated Text Extraction from Images using OCR System,” 2019 6th International Conference on Computing for Sus- tainableGlobalDevelopment(INDIACom),NewDelhi, India, 2019, pp. 145-150.
[4] Saout, Thomas & Lardeux, Fre´de´ric &Saubion, Fre´de´ric. An Overview of Data Extraction From In- voices. IEEE Access, 2024, PP. 1-1. 10.1109/AC- CESS.2024.3360528.
[5] Gonza´lezEnr´?quez,Jose´&JimenezRamirez,Andres &Dom´?nguezMayo,FranciscoJose´ &Garcia-Garcia, J.A.. Robotic Process Automation: A Scientific and In- dustrialSystematicMappingStudy. IEEEAccess, 2020, PP.1-1.10.1109/ACCESS.2020.2974934.
[6] S.Surana,K.Pathak,M.Gagnani,V.Shrivas- tava,M.T.RandS.MadhuriG,”TextExtraction and Detection from Images using Machine Learning Techniques: A Research Review,” 2022 International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2022, pp. 1201-1207, doi: 10.1109/ICEARS53579.2022.9752274.