Data Insights to Machine Learning Model

Authors: Raghhul O, Karthigai Selvam M, Roshan Bhaskar, Dr. GV. Shrichandran

DOI Link: https://doi.org/10.22214/ijraset.2025.70343

Abstract

This project introduces an intelligent framework. It automates end-to-end workflows of machine learning through joint AI agents. Each agent specializes in critical data load, target selection, preprocessing, exploratory analysis and model training to ensure systematic and interpretable model development. The Crewai-built system integrates Pydantic for verification, pandas for data processing and SCIKIT learning for modeling, providing efficiency and transparency.Major innovations include heuristic target selection, adaptive preprocessing, and self-study code generation. This framework reduces manual movement, ensures adaptation flexibility, and is ideal for fast prototypes and reproducible analysis. By combining structured automation and co-decision-manufacturing, this approach closes the gap between application accessibility and performance for machine learning

Introduction

Overview

The rapid evolution of AI and machine learning (ML) demands efficient, scalable, and interpretable automation. Traditional ML workflows are manual and inconsistent. Automated Machine Learning (AutoML) frameworks like TPOT and Auto-sklearn address these challenges but often lack transparency and adaptability.

CrewAI introduces a modular, agent-based framework that automates end-to-end ML workflows while ensuring reproducibility, interpretability, and user-friendliness for both technical and non-technical users.

Core Contributions

1. Framework Design & Objectives

Goal: Build agent-based automated ML pipelines using CrewAI.
Key Objectives:
- Automate data loading, preprocessing, exploratory data analysis (EDA), model training, and reporting.
- Use heuristic logic for dynamic target selection and adaptive preprocessing.
- Integrate Pydantic for transparent data validation.
- Reduce workflow execution time by up to 40%.
- Produce interpretable outputs (models + reusable code).
- Enable scalability and integration with advanced tools like deep learning, hyperparameter tuning, and real-time monitoring.

2. Architecture & System Design

Agents specialize in distinct tasks:
- EDA Agent: Uses Pandas and Matplotlib/Seaborn for data analysis.
- Model Selection Agent: Chooses algorithms based on data properties (e.g., RandomForest for classification).
- Training Agent: Trains models and calculates performance metrics.
- Tuning Agent: Uses GridSearchCV for hyperparameter optimization.
- Reporting Agent: Compiles training code, metrics, and EDA insights into markdown reports.
Technologies Used:
- Python, Scikit-Learn, LangChain, Pydantic, Streamlit, OpenAI API for NLP support.
- Modular pipeline allows code generation via LangChain's PythonRepl and real-time report generation.

Evaluation & Performance

Classification Metrics: Accuracy, Precision, Recall, F1-Score.
Regression Metrics: MAE, MSE, R².
Results are auto-integrated into structured reports for clarity and traceability.
Performance Highlights:
- 80% reduction in manual workload.
- Increased transparency and adaptability across diverse datasets.
- Dynamic agent collaboration ensures robustness and flexibility.

User Experience & Interface

Streamlit UI:
- Enables data upload, parameter selection, and real-time visualization.
- Interactive widgets help adjust model settings and monitor pipeline progress.
- Plans include cloud integration and dashboard enhancements.

Literature Insights

Compared with prior AutoML systems:
- Auto-sklearn and TPOT optimize models but operate as black-boxes.
- CrewAI emphasizes explainability and modular orchestration via agent collaboration.
Inspired by works on:
- Human-AI collaboration
- Interpretable ML
- Modular and multi-agent systems
- Model documentation (e.g., Model Cards)

Conclusion

The CrewAI-based ML pipeline offers a robust, scalable, and explainable AutoML solution. It fills the gap between automation and transparency through agent-based orchestration. The system democratizes ML by enabling both novice and expert users to build, understand, and iterate ML models efficiently.

References

[1] Wang, Zhaozhi, et al. (2023): \"Multi-Agent Automated Machine Learning.\" CVPR 2023. [2] Trirat, Patara, et al. (2024): \"AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML.\" arXiv preprint arXiv:2410.02958. [3] Chi, Yizhou, et al. (2024): \"SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning.\" arXiv preprint arXiv:2410.17238. [4] Fatouros, George, et al. (2025): \"Towards Conversational AI for Human-Machine Collaborative MLOps.\" arXiv [5] preprint arXiv:2504.12477. [6] Heffetz, Yuval, et al. (2019): \"Deep Line: Auto ML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering.\" arXiv preprint arXiv:1911.00061. [7] Karmaker, Kanti, et al. (2023): \"Automating the Machine Learning Process using PyCaret and Streamlit.\" ResearchGate. [8] Ali, Moez (2021): \"Write and Train Your Own Custom Machine Learning Models Using PyCaret.\" Medium. [9] Ali, Moez (2021): \"Build and Deploy ML App with PyCaret and Streamlit.\" PyCaret Documentation. [10] Ali, Moez (2021): \"Deploy Machine Learning App Built Using Streamlit and PyCaret on Google Kubernetes Engine.\" Medium. [11] PyCaret Team (2021): \"Deploy PyCaret and Streamlit on AWS Fargate.\" PyCaret Documentation. [12] Oracle AI Team (2024): \"AutoML-Agent: Pioneering Full-Pipeline Automation for Vertical AI Business Ecosystems.\" Medium. [13] Robyn Le Sueur (2024): \"Building Simple User Interfaces for CrewAI with Streamlit.\" LinkedIn. [14] Folch, Albert (2025): \"Introducing My First Streamlit and CrewAI Project!\" Streamlit Community Forum. [15] Analytics Vidhya (2021): \"Build Web App Instantly for Machine Learning Using Streamlit.\" Analytics Vidhya Blog. [16] Wikipedia Contributors (2025): \"Agentic AI.\" Wikipedia. [17] Vation Ventures (2024): \"Artificial Intelligence Agents: Architecture & Applications.\" Vation Ventures Research Article. [18] Microsoft Azure (2025): \"AI Architecture Design - Azure Architecture Center.\" Microsoft Learn. [19] Google Cloud (2024): \"MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.\" Google Cloud Architecture Center. [20] ScienceDirect (2023): \"AutoML: A Systematic Review on Automated Machine Learning with a Look into the Future of Evolutionary Approaches.\" ScienceDirect. [21] SpringerLink (2024): \"Automated Machine Learning: Past, Present and Future.\" SpringerLink.

Copyright

Copyright © 2025 Raghhul O, Karthigai Selvam M, Roshan Bhaskar, Dr. GV. Shrichandran. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70343

Publish Date : 2025-05-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here

A PHP Error was encountered