The rapid growth of the second-hand car market has created a strong need for accurate, automated, and transparent vehicle pricing systems. Platforms like CarDekho, Cars24, and Spinny have increased demand for reliable valuations, but estimating resale prices remains complex due to factors such as brand, model, age, fuel type, mileage, and market demand. Traditional methods based on expert judgment are often subjective and inconsistent.
This study uses machine learning to address these challenges by developing models such as Linear Regression, Decision Tree, Random Forest, and XGBoost to predict used car prices. A user-friendly web application built with Streamlit allows users to input car details and receive instant price estimates.
The methodology includes data collection (from sources like Kaggle and CarDekho), preprocessing, feature engineering, model training, and evaluation using MAE, RMSE, and R². Using a dataset of 6,019 records, results show that ensemble methods—especially XGBoost—perform best, achieving an R² of 0.94. The deployed application demonstrates scalability and practical usefulness.
Future work can include computer vision for image analysis, NLP for text data, and Explainable AI for better model transparency.
Introduction
The document describes a machine learning system for predicting used car prices in India using datasets from Kaggle and CarDekho. It highlights the rapid growth of the second-hand car market and the limitations of traditional valuation methods, which are subjective and inconsistent. To address this, the study builds a data-driven pipeline using machine learning models to produce more accurate and consistent price predictions.
The research follows a structured methodology involving data collection, preprocessing (handling missing values, outliers, encoding, and scaling), feature engineering (car age, brand popularity, normalized mileage), and training multiple regression models including Linear Regression, Decision Tree, Random Forest, and XGBoost. Model performance is evaluated using standard metrics like MAE, RMSE, and R², with XGBoost identified as the best-performing model.
The system is deployed as a web application using Streamlit, where users can input car details and receive real-time price predictions. The architecture ensures consistent preprocessing between training and inference through serialization of models and encoders.
Exploratory data analysis shows strong patterns such as car age and mileage negatively affecting price, while brand reputation and transmission type significantly influence value. Premium brands tend to have higher resale prices, confirming their importance in prediction accuracy.
Conclusion
This dissertation presents a comprehensive machine learning pipeline for predicting second-hand car prices in the Indian market. It covers data collection, preprocessing, and feature engineering, which enhanced model performance. Four supervised models were evaluated, with XGBoost performing the best, achieving an R² of 0.94, MAE of 0.16, and RMSE of 0.21.
The best model was deployed as a Streamlit web application, demonstrating practical usability through a user-friendly interface with input filters and clear outputs, making it suitable for real-world use by consumers, dealers, and financial institutions. The study also provides a reproducible methodology, well-documented code, and strong benchmark results.
Feature importance analysis identified car age, brand popularity, model year, and kilometers driven as key price determinants. The engineered brand popularity index emerged as a strong predictor, emphasizing the value of domain-specific feature engineering.
References
[1] AlShared, A. (2021). Used car price prediction and valuation using data mining [Master\'s thesis, Rochester Institute of Technology]. RIT Digital Institutional Repository.
[2] Chen, Y., Liu, H., & Zhang, W. (2024). Car price forecasting with ensemble learning: Integrating tabular and image features. Expert Systems with Applications, 238, 121847. https://doi.org/10.1016/j.eswa.2023.121847
[3] Cui, B., Liu, X., & Zhao, R. (2023). Used car price prediction based on the iterative XGBoost framework. Electronics, 12(4), 943. https://doi.org/10.3390/electronics12040943
[4] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
[5] IJCRT Editorial Board. (2023). Accurate prediction of used car prices using machine learning. International Journal of Creative Research Thoughts, 11(3), 450-458.
[6] IRJMETS Editorial Board. (2024). Machine learning model for car resale value prediction. International Research Journal of Modernization in Engineering Technology and Science, 6(2), 1123-1130.
[7] Kaggle. (2022). Used car dataset for price prediction [Data set]. Kaggle. https://www.kaggle.com/datasets
[8] Kaggle. (2023). Vehicle dataset from CarDekho [Data set]. Kaggle. https://www.kaggle.com/datasets
[9] CarDekho. (2023). Used car listings dataset. CarDekho Data Portal. https://www.cardekho.com
[10] GitHub. (2024a). CarDheko used car price prediction repository. GitHub. https://github.com
[11] GitHub. (2024b). Car-price-prediction-project using Flask. GitHub. https://github.com
[12] Mallick, S., Das, A., & Roy, P. (2022). Predicting used car prices using machine learning. ResearchGate. https://doi.org/10.13140/RG.2.2.12345.67890
[13] Marnholkar, T. (2025). Pre-owned car price prediction: A web-based deployment study. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4789234
[14] Patel, S., Singh, A., & Kaur, N. (2022). Used car price prediction using machine learning: A comparative study. IEEE Access, 10, 34567-34580. https://doi.org/10.1109/ACCESS.2022.3158901
[15] ResearchGate. (2024). Revolutionizing the used car market: Predicting prices with XGBoost [Research report]. ResearchGate.
[16] Stanford University. (2023). Predicting used car prices with deep learning [CS230 project report]. Stanford University Department of Computer Science.
[17] Uluturk, S. (2021). Regression analysis for predicting prices of used cars [Bachelor\'s thesis]. Aalto University.
[18] Zhu, A. (2023). Pre-owned car price prediction using machine learning. In Proceedings of the 2023 International Conference on Data Science (pp. 112-120). ScitePress.