The need for a versatile mobile and web application that integrates document and multimedia conversion, and accessibility features has increased due to advancements of digital technologies. This research presents the design and development of a cross-platform mobile application built using Flutter, utilizing cutting-edge technologies to meet various users’ requirements. The application includes a set of tools such as PDF-to-Word, Word-to-PDF, and PPT-to-PDF converters, images to PDF, XLS-to-PDF, merge PDFs, image-based text extraction through Optical Character Recognition (OCR), speech-to-text, and video summarization through YouTube link parsing. The application also offers access to several AI tools needed for different tasks through a single search interface, enabling efficient discovery and use of AI-based features by users. This app has a scalable architecture that utilizes Flutter\'s responsive design capabilities to ensure optimal usability on browser, tablet, and mobile devices. The Provider state management solution provides efficient state management for seamless navigation and interaction.Integrating AI-driven capabilities, such as summarization and OCR, enriches user experience by offering precision and automation in data processing. This paper outlines the technical realization, architectural design, and development issues
Introduction
Overview
FileSmith is a cross-platform, Flutter-based mobile and web application designed to streamline digital document management. It integrates document conversion, speech-to-text, OCR, summarization, and AI tool access into a single user-friendly platform. Built with Firebase and Google ML Kit, it aims to replace multiple fragmented tools with a unified, efficient solution.
Key Features
Multi-Format Document Conversion
Converts between PDF, DOCX, PPT, XLS, and image formats.
Includes PDF merging, splitting, and image-to-PDF generation.
Optical Character Recognition (OCR)
Extracts text from images using Google ML Kit.
Supports editing, copying, and exporting extracted text as PDFs.
Speech-to-Text Transcription
Converts spoken words into editable text with 92% average accuracy.
Useful for note-taking, accessibility, and lecture transcription.
AI-Powered Summarization
Summarizes long documents and YouTube video content.
Enhances productivity by reducing reading time.
AI Tools Search Hub
Single search interface to discover and use AI tools like ChatGPT, Bard, and DALL·E.
Backend: Firebase (authentication, storage), custom online processing server for heavy tasks like PDF to DOC conversion.
AI/ML Integration: Pre-trained models for OCR, speech-to-text, and summarization.
Design & Implementation
Responsive UI: Works on Android, iOS, and web browsers.
Modular Architecture: Enables easy feature expansion and maintenance.
File Handling: Allows users to choose storage directories, with automatic temp file cleanup for space efficiency.
Performance Evaluation
Feature
Accuracy / Speed
Storage Impact
Speech-to-Text
92% accuracy, 1.2s/sentence processing
Editable output saves space
OCR
96% accuracy, 1.5s/image
60% smaller than image files
PDF to DOCX
97% (text), 90% (complex) accuracy, 6.2s/5p
30% smaller than original PDF
PDF Merge
3.8s for 10 pages
40% storage savings
Image to PDF
5.6s for 5 images
35% size reduction
AI Tools Access
30% faster discovery
Single-point access system
Literature Review Insights
Existing tools (e.g., Adobe Acrobat, Small PDF) are fragmented or expensive.
OCR, transcription, and summarization tools often exist in isolation.
FileSmith bridges these gaps with free, integrated, and cross-platform functionality.
Conclusion
This research presents a flexible document conversion app developed with Flutter, designed to meet modern productivitydemands. With a user-friendly interface, responsive design, and AI-powered features, it provides a seamless and efficient document management experience.
Unlike traditional tools that require multiple applications for different tasks, this app brings everything together—document conversion, OCR, and text summarization—into one unified platform. Its cross-platform support and optimized performance make it accessible to a broad audience, from students and researchers to working professionals. By leveraging AI-driven automation, the app streamlines workflows and boosts efficiency. Future enhancements, such as integrating large language models, aim to further expand its capabilities and usability.
References
[1] Bhasin K, Goel A, Gupta G, Singh SK, Bhowmick A. A secure mobile application for speech to text conversion using artificial intelligence techniques. InWINS/CVMLH 2023 (pp. 44-53).
[2] Bagal, Vipul, KiranGaykar, and MsPurnimaAhirao. \"Image based Text Translation using Firebase ML Kit.\" Grenze International Journal of Engineering & Technology (GIJET) 8.2 (2022).
[3] Abbas, Anas, et al. \"A simple PDF Converter using Android with built in editing.\" JOURNAL OF COMPUTER SCIENCE (ISSN NO: 1549-3636) 15.09 (2022).
[4] Penny, LaToyiaDeVonne. Design & implementation of a PDF to Excel conversion tool (P2X). MS thesis. Oklahoma State University, 2008.
[5] Sahoo, Rohit, et al. \"Auto-Table-Extract: A System To Identify And Extract Tables From PDF To Excel.\" Int. J. Sci. Technol. Res 9 (2020): 217.