Authors: G. Kalpana, Dr. A Kanaka Durga, Anoop Reddy, Dr. G Karuna
Certificate: View Certificate
Storm Motors Is An E-Commerce Company Who Act As Mediators Between Parties Interested In Selling And Buying Pre-Owned Cars. They Have Recorded Data About The Seller And Car Details, Registration Details, Web Advertisement Details, Make And Model Information And Price. The Company Wishes To Develop An Algorithm To Predict The Price Of Pre-Owned Cars Based On Various Attributes Associated With The Car To Make A Sale Quickly, If The Price Is Reasonable And Satisfies Both The Seller And Buyer, By Comparing The Price Of Various Car Models Based On Car Features To Improve Their Business. In This Paper, We Have Conducted A Comparative Study Using Machine Learning Algorithms Like Linear Regression And Random Forest Algorithms Which Is Implemented With Jupyter Note Book. The Study Shows That Linear Regression Algorithm Performance Is More Than Random Forest Algorithm. We Have Also Experimented With Auto Ai Experimentation In Ibm Cloud Watson Studio, Which Automatically Builds The Best Predictive Model By Comparing With Other Algorithm, With Accurate Measures. In This Auto Ai Experiment We Have Found That Linear Regression Is Performing Better Than Ridge Algorithm And Random Forest. The Main Objective Of This Paper Is To Find The Best Predictive Model For Predicting Pre-Owned Car Price.
From the private car under assessment conditions.(2) The data we acquired were from the used car trade market The automotive industry is a cornerstone of the national economy, and forecasting automobile sales correctly is of great importance.  The pre-owned automobile market is an ever-increasing industry, almost doubling its market value in recent years. The advent of online portals such as CarDheko, Quikr, Carwale, Cars24 and many others has made the customer's and many others' needs simpler. It's common in many developed countries to rent a car instead of buying it outright.
A lease is a binding contract between a buyer and a seller (or a third party – usually a bank, insurance company or other financial institutions) where the buyer has to pay fixed instalments to the seller / financing company for a predefined number of months / years. The buyer has the option to purchase the car at its residual value , i.e. its expected resale value, after the lease period is over. Sellers / financers are therefore of commercial interest in being able to predict the salvage value (residual value) of cars with precision.
Every year the car industry has become more and more dynamic and has expanded globally. Therefore, in this competitive car market, an exact price must be set for both customers and manufacturers. Customers and manufacturers are confused about the purchase or sale price for the car. Consequently, on the Internet , customers and manufacturers are trying to seek advice from auto-dealers, car magazines or the website. This information, however, takes a long time and could confuse the customers on the market.
Some modeling hypotheses have been set as follows : ( 1) the pre-owned car we mentioned here was private car only, not including the car used as a commercial car like taxi or as a chauffeur-driven car in government or company, which is different in Shanghai which may be a little different from the other place.
Predicting vehicle prices is considered a challenging issue, as there are many different factors that affect the price of vehicles. Besides the characteristics of vehicles such as brands, manufacturers, models, engines, fuel, etc., there are also many external factors that affect the price of automobiles such as taxes or distance traveled.
From previous studies it can be seen that different factors were chosen by the authors as input variables for forecasting car prices. These characteristics are diverse, and they consist of many qualitative variables. Quantifying the qualitative data is therefore a crucial step in pre-processing data before it is placed into the model for predicting vehicle prices. It is one of the big contributions of the paper, too.
II. MACHINE LEARNING ALGORITHMS
Machine learning algorithms, based on a certain set of features, can be used to predict a car's retail value. Different websites have different algorithms to generate the retail price of the used cars, and therefore there is no unified price algorithm. By training statistical models to predict prices, a rough price estimate can be easily obtained without actually entering the details on the desired website.
The main objective of this paper is to use different prediction models to predict the retail price of a used car and compare their levels of accuracy.
A. Random Forest
Random forest is mainly used for classi?cation, but we used it as a model of regression by turning the issue into an equivalent issue of regression. Their trees (weak-learner) are trained on small parts of the dataset individually and help to learn highly unpredictable patterns by growing very deeply. This solves overfitting problem by combining the predictions of individual trees with a view to raising the variance and maintaining consistency.
B. Linear Regression
Regression is a supervised-learning approach. Continuous variables can be modelled and predicted. In Regression we have the labeled datasets and the value of the output variable is determined by the values of the input variable-so this is the supervised learning approach. The simplest form of regression is linear regression, where attempts are made to fit a straight line (straight hyperplane) into the dataset, and when the relationship between the data set variables is linear. Linear regression has the advantage of being easy to understand and regularization also makes it easy to avoid over fitting. We can also use SGD to update the linear templates with new data. Linear Regression is a good fit if the covariate-response variable relationship is known to be linear. It shifts from statistical modeling to analyzing and preprocessing the data. Linear Regression is useful for thinking about the method of data analysis. However, for most practical applications this is not a suitable approach because it oversimplifies real world problems.
C. Ridge Logistic Regression
(Hoerl and Kennard, 1970; Cessie and Houwelingen, 1992; Schaefer et al., 1984), A maximization of the likelihood function is obtained by applying a penalized parameter to all coefficients except intercept. The ordinary logistic regression with binary response is given by the probability of success of the answer.
III. DATA SET DESCRIPTION
The car data set used in this research were collected from website. This dataset consists of 50,002 car observations and the 19 attributes of pre-owned car are from an e-commerce site as shown in Table I and II. These datasets may contain a significant number of pre-owned cars information with several presumably requiring some tweaking and engineering. For example, duplicated observations can affect model performance and must be removed in advance. For this action the study used python programming language.  
A descriptive statistics of categorical variables is shown in table I. Technically, attributes such as dateCrawled, lastSeen, postal-code, and dateCreated have no effect whatsoever on price prediction, so they can be removed to improve model performance. Since their values are highly unbalanced, attributes such as seller, offerType, abtest, and nrOfPicture were also removed with the data preparation process by inspecting more detail on dataset. Finally, the name was removed as well, because it contains too many unique values.
B. Comparative analysis on price prediction This work incorporates many machine learning algorithms available in the machine learning library Scikit-learn. Each model is trained using the same training data and tested with the same test data. The result was then compared in the next section, and described. The regression-based method has been proven reliable in predicting a continuous variable in supervised machine learning
Table1:Descriptive Statistic of Categorical Variables
C. Factors Influencing
Here are some key factors that influencing Design
To give the solution for the above problem statement here we have used Machine Learning Algorithms like Linear Regression and Random Forest. Here we have used python language to develop source code. we have csv dataset in to the jupyter IDE. This data set contains 50000 rows and 19 columns(variables). As part of preprocessing we removed unwanted columns, duplicate records ,missing values and we have done data cleaning. The variables can be grouped into different buckets based on the imported pandas, numpy, seaborn libraries and we read cars_sample. information. We identified variables influencing price and looked for relationship among variables for that here we have used correlation, boxplot, scatterplot. We also identified outliers for that we have used box plots, histograms etc. We filtered data based on logical checks for that we have used variables price, year of registration, power and then reduced number of data. Here we have imported Train_test_split, Linear Regression, Random Forest Regressor, mean_square_error libraries from sklearn. These libraries are used for visualizing ,building models and evaluating performance. With these models one can predict the selling price of various car models and compare the price of car based on various features of the car to satisfy the buyer. The output models, metrics we can see in the console of jupyter.
Regression is a statistical approach to finding the relationship between variables. In machine learning regression models are used to predict a continuous value. Here we used linear regression based on supervised learning.We have also compared with random forest regressor algorithm.
Pricing issue Predicting car prices is a problem of regression analysis in which the price of the car is a dependent variable and the characteristics of the vehicle (brand, car model, year of registration, type of gearbox, type of fuel, ...) are independent variables.
This project is more helpful for all e-commerce companies who act as mediators for selling and buying pre-owned cars. The customer can easily be convinced in taking a decision to buy a pre-owned car out of various car models with various features. The seller can easily convince the buyer by comparing and analyzing various models.The seller and buyer both are satisfied with this process. This model reduces time and cost and is also more user friendly as a result of which there is improvement in business by selling more cars. Here we are also conducting a comparative study on performance of regression based on supervised machine learning models. Each model is trained using data of used car market collected from e-commerce website. As a result, Linear regression gives the best performance with Root mean square error (RMSE) =8902.410 . Followed by ridge, random forest regression algorithms respectively. We can also extend this project by considering more attributes like Resale history, Lic ,Accidents history,image etc to the data set for getting clear and ccurate analysis.
 Yuan Qingyu, Liu Ying, Peng Geng, Lv Benfu” A Prediction Study On The Car Sales Based On Web Search Data “, 978-1-4244-8694-6/11/$26.00 ©2011 Ieee  Pattabiraman Venkatasubbu, Mukkesh Ganesh “Used Cars Price Prediction using Supervised Learning Techniques” International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-1S3, December 2019  Sameerchand Pudaruth1” Predicting The Price Of Used Cars Using Machine Learning Techniques” International Journal Of Information & Computation Technology. Issn 0974-2239 Volume 4, Number 7 (2014), Pp. 753-764  Doan Van Thai, Luong Ngoc Son, Pham Vu Tien, Nguyen Nhat Anh, Nguyen Thi Ngoc Anh On” Prediction Car Prices Using Quantify Qualitative Data And Knowledge-Based System”, 978-1-7281-3003-3/19/$31.00 C 2019 Ieee  Shen Gongqi, Wang Yansong, Zhu Qiang” A New Model For Residual Value Prediction Of The Used Car Based On Bp Neural Network And Nonlinear Curve Fit” 011 Third International Conference On Measuring Technology And Mechatronics Automation, 978-0-7695-4296-6/11 $26.00 © 2011 Ieee  N. Pal, P. Arora, D. Sundararaman, P. Kohli, And S. Sumanth Palakurthy, “How Much Is My Car Worth? A Methodology For Predicting Used Cars Prices Using Random Forest,” Arxiv E-Prints, P. Arxiv:1711.06970, Nov 2017.  Susmita Ray Department Of Computer Science & Technology “A Quick Review Of Machine Learning Algorithms “2019 International Conference On Machine Learning, Big Data, Cloud And Parallel Computing (Com-It-Con), India, 14th -16th Feb 2019  Jose Manuel Pereiraª*, Mario Bastoa , Amelia Ferreira Da Silva” The Logistic Lasso And Ridge Regression In Predicting Corporate Failure” Sciencedirect, 3rd Global Conference On Business, Economics, Management And Tourism, 26-28 November 2015, Rome, Italy  Nitis Monburinon, Prajak Chertchom,Thongchai Kaewkiriya, Suwat Rungpheung, Sabir Buya, Pitchayakit Boonpou Title” Prediction Of Prices For Used Car By Using Regression Models”  G.Rossum, “Python Reference Manual,” Amsterdam, The Netherlands, The Netherlands, Tech. Rep., 1995.  A. K. Elmagarmid, P. G. Ipeirotis, And V. S. Verykios, “Duplicate Record Detection: A Survey,” Ieee Transactions On Knowledge And Data Engineering, Vol. 19, No. 1, Pp. 1–16, Jan 2007.  G.Chandrashekar And F. Sahin, “A Survey On Featureselection Methods,” Computers & Electrical Engineering, Vol. 40, No. 1, Pp. 16–28, 2014. [Online]. Available:  J.Morgan, “Classification And Regression Tree Analy-Sis,” Bu.Edu, No. 1, P. 16, 2014. [Online]. Available: Http://Www.Bu.Edu/Sph/Files/2014/05/Morgancart.Pdf  N. Kanwal And J. Sadaqat, “Vehicle Price Prediction System Using Machine Learning Techniques,” International Jounal Of Computer Ap-Plications, Vol. 167, No. 9, Pp. 27–31, 2017.  R.Taylor, “Interpretation Of The Correlation Coefficient: A Basic Review,” Journal Of Diagnostic Medical Sonography, Vol. 6, No. 1, Pp. 35–39, 1990.
Copyright © 2022 G. Kalpana, Dr. A Kanaka Durga, Anoop Reddy, Dr. G Karuna. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.