Authors: Prof. Sachin Sambhaji Patil, Mahesh Manohar Sirsat, Ajitkumar Vishwakarma Sharma, Aashish Shahi, Omkar Maruti Halgi
DOI Link: https://doi.org/10.22214/ijraset.2023.50406
Certificate: View Certificate
With the increasing volume, velocity, veracity, and variety of data, it has become critical to have efficient techniques and tools for managing and analyzing data in machine learning. Abstraction is a powerful concept that allows users to interact with machine learning algorithms without understanding their technical implementation details. In this project the user will provide the dataset in .csv format the dataset is then processed further to different machine learning preprocessing steps like removing unwanted columns, handling missing values, label encoding, outlier detection and removal, normalization, model building, model prediction, and the result can be downloaded as pdf, tracable pdf and CSV, this all processes gives a result of different model and their respective accuracy so that we can choose the best model for that particular dataset. tracable pdf will be containing all the timestamp of the processes done with their respective result, Apart from client-server model user is also provided a api so that all processes can be implemented in different platforms like c++, java, ruby etc. Overall, this paper highlights the critical role of abstraction in managing the complexity of data and machine learning algorithms, enabling more efficient and effective analysis of large and complex datasets.
I. INTRODUCTION
In today's world, information sharing needs to be fast and efficient. We need tools to take effectively collected data sets from various sources and present and present these visuals in the form of charts, patterns, etc. The tools created process datasets and automate the task of finding various patterns and decoding their semantic structure. The main purpose of integrating tools with datasets is to focus on how the functionality is used rather than how it is implemented to perform further analysis.
According to IDC's AI predictions for 2020 and beyond, IT must invest heavily in data integration, management, and cleansing to effectively use intelligent automation. Data professionals continue to be plagued by the tedious task of data cleansing. Organizations cannot achieve their digital transformation goals without an efficient way to automate data cleansing. [1] IDC Future Scape report finds solving historical data problems in legacy systems can be a significant barrier to entry, especially for large organizations, highlighting the challenges associated with adopting digital initiatives.
According to Morningstar, businesses have spent an estimated $1.3 trillion (USD) on digital transformation initiatives in the past year alone. McKinsey later reported that 70% of his programs were inadequate. Tracking it down at home, outages like this cost businesses over $900 billion.
Businesses cannot afford repeated failures in their digital transformation, regardless of the size of the investment lost. You need clean, standardized data to unlock the benefits of your digital transformation projects, but collecting the data you need in the way you need it can be tedious, expensive, and time consuming.
II. MOTIVATION
The Motivation Behind this project is that those who are beginner’s Starting with machine learning they don’t understand all that workflow of machine learning to overcome this problem we have made this tool so that the tool can guide or recommend the step by step process in every stage there will be multiple option provided to user according to need, atlast user will come to know how thing works.
Here are some Motivation that are are encorporated in this project:-4
III. PROPOSED SYSTEM
The system then generates different models from the preprocessed dataset and calculates their respective accuracy. The user can choose the best model for their particular dataset based on the generated results. The system also provides the user with the option to download the results in different formats, including PDF and CSV.
In addition, the proposed system includes a client-server model that allows the user to interact with the system through a web interface.
The system also provides an API that allows the user to implement the preprocessing steps and model building process in different programming languages, including C++, Java, and Ruby. Finally, the system generates a traceable PDF report that includes the timestamp of each process and their respective results.
IV. LITERATURE REVIEW
Automated machine learning (AutoML) has become one of the most dynamic sub-areas in the data science field. Sounds great to a non-machine learning expert, but to a practicing data scientist, it sounds terrifying.
AutoML seems to be able to completely change the way models are generated by removing the need for data scientists based on how they are portrayed in the media.
While some companies like DataRobot want to fully automate the machine learning process, much of the industry is using AutoML as a tool to augment the capabilities of today's data scientists and expand the field.
We make it easy for people just starting out. Now that everything about the system is automated, what is left for the users of these systems? Just get the dataset and check the results.
This level of automation poses potential problems not only for the model, but also for the users who interpret it. Three things turned out to be the most common when studying to become a data scientist.
Here are some gaps that are researched before the project is being carried out, all the gaps are overcomed in this project:-
V. SYSTEM ARCHITECTURE
VI. FUTURE WORK
As the project progresses, it effectively improves the visualization of the material. The integrity and consistency of the material is maintained throughout the request response cycle. Provides an important feature that is easy to use when analyzing data and provides meaningful information when extracting data. We present a flow-based view of services through case studies and an overview of the business. One may argue that at the first stages of design, flow-based conceptualization promises to provide Web application development with a more stable basis. This flow-based approach may be used with modern software development methodologies.
[1] IEEE, “A dataset of attributes from papers of a machine learning conference Algorithm,” 2019. [2] IEEE, “Missing Data Analysis in Regression,” 2022. [3] IEEE, “A survey on outlier explanations,” 2022. [4] I. F. Qayyum and D.-H. Kim, “A Survey of datasets, preprocesssing, modelling mechanisms,” 2022. [5] C. Fan and M. Chen, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” 2021
Copyright © 2023 Prof. Sachin Sambhaji Patil, Mahesh Manohar Sirsat, Ajitkumar Vishwakarma Sharma, Aashish Shahi, Omkar Maruti Halgi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET50406
Publish Date : 2023-04-13
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here