Real-Time Vehicle Detection, Tracking, and Speed Estimation Using YOLO v11 and Enhanced SORT with Automatic Camera Calibration

Authors: Vijay M, A. Naga Sai Teja, B. Venkata Sandeep Reddy, Ch. Manikanta Raghava, M.D.S. Subrahmanyam

DOI Link: https://doi.org/10.22214/ijraset.2025.75754

Abstract

The precise vehicle detection, following, and speed estimation are the essential elements of intelligent transportation system and traffic analytics. Conventional systems tend to be based on manual calibration or multi-sensors, which restrict scalability and flexibility. The paper introduces a vision-based, fully automated real-time vehicle monitoring system that combines the YOLO v11 object detector, an Enhanced SORT tracker to provide very robust multi-object tracking and an automatic camera calibration system to self-estimate the pixel-to-meter conversion ratio. The suggested system does not require manual calibration and takes advantage of such dimensions as typical vehicles, confidence-weighted fusion, and statistical filtering to stabilize scale estimation. Enhanced SORT tracker uses a Kalman filter-based motion model, and IOU + LAP association to get the same object ID and extract the velocity. Experimental results of testing on several traffic video streams show a mean processing rate of 25-30 frames/s, error of speed estimation of +-2 km/h and convergent calibration within 45 seconds. Structured CSV and JSON analytics are also produced by the system to analysis additional traffic flow and anomalies. The findings are that the suggested solution is a cost-efficient, scalable, and precise solution to real-time traffic monitoring and vehicle behavior analytics.

Introduction

Intelligent Transportation Systems (ITS) increasingly depend on automated, accurate, and real-time computer vision systems to analyze traffic, estimate vehicle speeds, and support congestion management. Traditional speed-measuring methods—like radar sensors, inductive loops, or manually calibrated cameras—require complex setup and are sensitive to camera placement. This creates a growing need for self-calibrating, vision-based systems with minimal hardware and human intervention.

Recent advances in deep learning, especially YOLO-based object detection models (e.g., Ultralytics YOLO v11), have greatly improved real-time vehicle detection accuracy by using efficient feature extractors and anchor-free prediction heads. For tracking, algorithms like SORT, Deep SORT, and ByteTrack ensure stable ID tracking using Kalman filters and assignment algorithms, but they typically operate in pixel space and lack real-world metric speed measurement.

To convert pixel motion to real-world distance, camera calibration is essential. Existing methods—using vanishing points, homography, or 3D vehicle alignment—often require manual measurements or fixed assumptions, limiting scalability.

Proposed System

The research introduces an end-to-end unified framework combining:

YOLO v11 for high-speed vehicle detection
Enhanced SORT Tracker for velocity-conscious multi-object tracking
Fully automatic camera calibration using canonical vehicle dimensions and statistical filtering

The system stabilizes pixel-to-meter estimation using confidence-weighted averaging and IQR filtering, achieving 25–30 FPS and speed estimation accuracy of ±2 km/h. It supports live visualization and structured outputs (CSV, JSON).

Related Work Summary

ITS research largely focuses on:

Vehicle detection: CNN-based models, especially YOLO families, outperform older feature-based methods.
Multi-object tracking: SORT, Deep SORT, and ByteTrack maintain stable IDs but lack metric speed conversion.
Camera calibration: Methods rely on vanishing points, lane markings, homography, or 3D alignment, but most require manual input or scene-specific assumptions.

Motivation

Existing detection and tracking methods provide real-time results, but calibration remains the missing component for practical, scalable speed estimation. The proposed hybrid framework fills this gap by integrating automated camera calibration with modern detection and tracking.

Methodology

The system consists of three major components:

YOLO v11 detector for real-time vehicle recognition
Enhanced SORT tracker using 7-dimensional state vectors and Kalman velocity estimation
Auto-Calibration engine estimating pixels-per-meter using observed vehicle dimensions and robust filtering

The pipeline converts pixel displacement into real-world speeds using:

vkm/h=3.6×fps×vpx/frameppmv_{km/h}=3.6 \times \frac{fps \times v_{px/frame}}{ppm}vkm/h?=3.6×ppmfps×vpx/frame??

Outputs include annotated videos, speed data, and detailed analytics.

Implementation

The system is implemented in Python with YOLOv11, OpenCV, FilterPy, and runs on a machine with an Intel i5 processor, 8GB RAM, and GTX 1650 GPU. It processes 720p traffic videos at 27–30 FPS.

Modules include:

YOLO detection
Enhanced SORT tracking
Auto-calibration
Speed computation
CSV/JSON export and real-time visualization

Results

Testing across highways, city roads, and intersections shows:

Detection confidence: 0.84
Tracking stability: 96.7% ID consistency
Speed error: ±4.5 km/h
Auto-calibration convergence: ~42 seconds
Runtime: 28.6 FPS

Outputs include annotated MP4 videos, analytics CSV, and summary JSON files.

Discussion

The integrated system achieves accurate real-time vehicle speed analytics without manual calibration. Its adaptive calibration handles varying camera angles and lighting. The modular design allows future extensions such as:

Density estimation
Traffic rule violation detection
Anomaly prediction

Overall, the system offers a scalable, automated, and robust solution for intelligent traffic surveillance.

Conclusion

Enhanced Vehicle Recognition System was created and implemented in the research to real-time vehicle recognition, tracking, and speed estimation using the assistance of YOLOv11, a recently developed advanced SORT tracking algorithm, and a self-calibrating engine. Without the need to sacrifice on the performance of practically real time which was 28-30 FPS the system was able to attain high detection rates and continuous tracking. It was discovered that auto-calibration module was useful in removing manual scaling since it learnt dynamically the pixel-to-meter ratio using common vehicle size. The proposed architecture yielded abridged analytics like CSV and JSON summaries with vehicle speed, class and calibration statistic reports. The computer vision and deep learning are brought together to make the system a potent and fully automated system to keep track of the traffic and deliver road analytics. It is also scalable and modular meaning that it can be implemented in any of the following environments; highway, intersection, and smart city surveillance.

References

[1] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and real-time tracking,” Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3464–3468, 2016. [2] N. Wojke, A. Bewley, and D. Paulus, “Simple online and real-time tracking with a deep association metric,” Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3645–3649, 2017. [3] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “ByteTrack: Multi-object tracking by associating every detection box,” Proc. European Conf. Comput. Vis. (ECCV), 2022. [4] Ultralytics, “YOLO11 Model Documentation,” Ultralytics, 2024. [5] Ultralytics, “Ultralytics YOLO – Docs Home,” Ultralytics, 2024. [6] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” Proc. European Conf. Comput. Vis. (ECCV), pp. 740–755, 2014. [7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010. [8] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. ASME–J. Basic Eng., vol. 82, pp. 35–45, 1960. [9] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Res. Logistics Quarterly, vol. 2, no. 1–2, pp. 83–97, 1955. [10] SciPy Developers, “linear_sum_assignment — SciPy v1.13.0 Manual,” 2024. [11] M. Dubská, A. Herout, and J. Sochor, “Automatic camera calibration for traffic understanding,” Proc. British Mach. Vis. Conf. (BMVC), 2014. [12] J. Sochor, R. Juránek, and A. Herout, “Comprehensive data set for automatic single camera visual speed measurement,” IEEE Trans. Intel. Transp. Syst., vol. 20, no. 5, pp. 1633–1643, May 2019. [13] J. Sochor, R. Juránek, A. Herout, and J. Havel, “Traffic surveillance camera calibration by 3D model bounding box alignment for accurate vehicle speed measurement,” Comput. Vis. Image Understand., vol. 161, pp. 87–98, 2017. [14] P. Revaud, S. Milani, G. Rizzoli, and R. Marani, “Robust automatic monocular vehicle speed estimation for traffic surveillance,” Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 12356–12366, 2021. [15] Y. Luo, W. Wang, and Z. Zhao, “A review of homography estimation: Advances and applications,” Electronics, vol. 12, no. 7, p. 1603, Apr. 2023.

Copyright

Copyright © 2025 Vijay M, A. Naga Sai Teja, B. Venkata Sandeep Reddy, Ch. Manikanta Raghava, M.D.S. Subrahmanyam. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75754

Publish Date : 2025-11-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here