This paper presents a comprehensive analysis of Uber trip data in New York City using data science techniques to improve ride-hailing efficiency. The project utilizes historical trip data from the NYC High Volume For-Hire Vehicle (HVFHV) dataset, incorporating clustering (K-Means) and prediction (Random Forest Regression) to forecast demand patterns. It also integrates Power BI for real-time visualization and insights. The aim is to optimize driver allocation, reduce passenger wait times, and enhance urban mobility.
Introduction
This study addresses urban mobility challenges by analyzing Uber trip data to improve driver allocation and meet ride-hailing demand efficiently. Using NYC TLC trip data combined with taxi zone and weather information, the dataset was cleaned and preprocessed for analysis.
Exploratory Data Analysis (EDA) with visualization tools identified patterns such as peak travel times and zone-based demand variations. K-Means clustering grouped locations with similar demand profiles, revealing high-demand pickup zones. A Random Forest regression model predicted trip volumes and wait times with an accuracy of about 84% (R² score).
Interactive Power BI dashboards visualized daily and seasonal demand trends, highlighting consistent peak demand in areas like Midtown and Downtown during weekdays and evenings. The study suggests using these insights for real-time vehicle distribution and dynamic pricing strategies.
Conclusion
The project successfully applied data analytics and machine learning to Uber trip data for proactive fleet management. By integrating visualization and prediction, it supports smarter, faster, and data-driven urban mobility solutions.
References
[1] Poritigadda, L., et al. (2024). Spatial Data Analysis on On-Demand Cab Services Using Spark.
[2] Golshanrad, P., et al. (2024). Proposing a model for predicting passenger origin–destination in online taxi-hailing systems. Public Transport.
[3] Kokkiligadda, M. R., et al. (2023). Spatial Data Analysis on On-Demand Cab Services using Spark. IEEE ICIMI.
[4] Roy, B., & Rout, D. (2021). Predicting Taxi Travel Time Using ML Techniques. Springer.
[5] Pradhan, R., et al. (2021). Analysing Uber Trips using PySpark. IOP Conf. Ser.: Mater. Sci. Eng.
[6] Wang, H., et al. (2021). Applying deep learning to taxi demand forecasting: CNN-LSTM model. Transp. Res. Part C.