Machine learning is widely used in many fields such as healthcare, finance, education, and automation. But developing a machine learning system usually takes a lot of time and effort. Tasks like understanding the problem,collecting data, cleaning data, and preparing datasets are done manually and individually; these tasks also requires technical knowledge. Hence, it makes machine learning difficult for non-technical users to develop and use, and it also it slows down the development process. This paper introduces a Rapid Application Development (Software Development) framework that helps simplify and speed up the early stages of machine learning development. The system enables users to specify their requirements in the form of a simple, natural language prompt. A Small Language Model (SLM) is used to extract important keywords from the prompt. These keywords are then passed to collect the relevant data from websites and public APIs. The collected data is then checked, cleaned, and stored. By following the Rapid Application Development principles, the proposed system reduces manual work, saves time, and provides a modular and scalable solution. The RAD ML framework makes machine learning model development easier, faster, and more accessible to students, researchers, and real-world applications.
Introduction
The text introduces RAD-ML, a system designed to make machine learning (ML) more accessible by allowing users to create and deploy ML models using natural language. Traditional ML requires significant expertise in data processing, feature engineering, and model selection, which limits its use by domain experts. While existing AutoML tools automate parts of this process, they still require clean data and technical knowledge.
RAD-ML addresses these gaps by combining small language models (SLMs) and retrieval-augmented generation (RAG) to automate the entire ML pipeline—from problem definition to deployment—based on natural language input. It solves three key challenges: translating vague user requests into technical tasks, automatically collecting relevant data, and building/deploying models with minimal human involvement.
The system operates in four main stages:
Understanding the user’s prompt and classifying the ML task
Automatically collecting and validating data from web sources
Preprocessing data and engineering features
Training models using AutoML and deploying them at scale
RAD-ML builds on prior work in AutoML, natural language interfaces, and rapid application development but improves upon them by offering end-to-end automation. Its architecture includes intelligent data ingestion, adaptive preprocessing, automated feature engineering, and scalable model training.
Experimental results show strong performance, including high accuracy in task classification (94.3%) and effective data retrieval (97.8%), demonstrating the system’s ability to produce high-quality ML solutions with minimal manual effort.
Conclusion
The contribution of this study is the development and presentation of a novel automated machine learning system, denoted as RAD-ML, which addresses the major limitations of existing AutoML systems through intelligent data acquisi- tion, adaptive preprocessing, automated feature engineering, dynamic model selection, and automated deployment.ML, demonstrates its significant performance advantages over ex- isting AutoML pipelines in accuracy, robustness in handling noisy and incomplete data, and computational efficiency. In particular, the adaptive preprocessing and automated feature engineering modules significantly contribute to the accuracy and performance gains, while the hyperparameter optimization and model selection strategies ensure the optimal learning configurations in response to varying data constraints.
References
[1] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto- WEKA: Combined selection and hyperparameter optimization of clas- sification algorithms,” in Proceedings of the 19th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, 2013, pp. 847–855.
[2] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems, vol. 28, 2015, pp. 2962– 2970.
[3] R. S. Olson, N. Bartley, R. J. Urbanowicz, and J. H. Moore, “Evaluation of a tree-based pipeline optimization tool for automating data science,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2016, pp. 485–492.
[4] P. Molino, Y. Dudin, and S. S. Miriyaala, “Ludwig: A type-based declarative deep learning toolbox,” arXiv preprint arXiv:1909.07930, 2019.
[5] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Re´, “Snorkel: Rapid training data creation with weak supervision,” in Proceedings of the VLDB Endowment, vol. 11, 2017, pp. 269–282.
[6] J. Martin, Rapid Application Development. Macmillan Publishing Co., Inc., 1991.
[7] Z. Wang, Y. Zhang, and C. Lee, “Neural information extraction: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2456–2470, 2021.
[8] F. Bernardi, M. Grierson, and R. Fiebrink, “Designing and Evaluating the Rapid Prototyping Music Technology,” Frontiers in Artificial Intel- ligence, Frontiers Media SA, 2020.
[9] M. A. Uzzaman, M. M. Rahman, and M. R. Rahman, “A Framework for Rapidly Prototyping Data Mining,” Big Data and Cognitive Computing, Multidisciplinary Digital Publishing Institute (MDPI), 2019.
[10] A. Sani, R. A. Rahayu, and K. Ibrahimi, “Rapid Software Framework for Classification Models,” International Journal of Technology and Advanced Engineering, IJEAT, ISO 9001:2008 Certified, 2017.
[11] P. Beynon-Davies, C. C. Clarke, H. Mackay, and D. Tudhope, “Rapid Application Development (RAD): An Empirical Review,” European Journal of Information Systems, Operational Research Society Ltd., 1999.
[12] M. Levy and I. Hadar, “Requirements Engineering for No-Code Devel- opment (RE4NCD): Case Studies of RAD During Crisis,” Information and Software Technology, Elsevier, 2025.