This research presents a modular deep learning- based pipeline designed to integrate traffic sign detection, lane segmentation, and lane change prediction for autonomous driv- ing. ThesystemutilizesYOLOv8 forobjectdetection (trafficsigns andspeedbreakers),andYOLOv8-segforlanesegmentation. A custom logic module processes lane masks for accurate lane changeprediction,whileGoogleText-to-Speech(gTTS)generates audio alerts. The pipeline supports real-time performance with GPU acceleration and processes videos offline with visual and verbalfeedback.Resultssuggesthighprecisionindetection and practical application for advanced driver assistance systems (ADAS).
Introduction
The study presents an Autonomous Driving Assist System that integrates state-of-the-art deep learning models with rule-based logic and audio feedback to enhance real-time perception of road environments. Leveraging YOLOv8 for object detection (traffic signs and speed breakers) and YOLOv8-seg for pixel-level lane segmentation, the system accurately detects critical road elements. A custom lane change prediction algorithm analyzes lane mask movements across video frames to anticipate lane changes.
The system incorporates Google Text-to-Speech (gTTS) to provide synchronized verbal alerts, improving safety and accessibility. It supports offline video processing optimized for GPU acceleration, enabling real-time or near-real-time performance on embedded hardware.
Key components include:
Object Detection: YOLOv8m detects multiple traffic signs and speed breakers with high accuracy.
Lane Segmentation: YOLOv8-seg provides detailed lane and vehicle segmentation, supporting lane change prediction.
Lane Change Prediction: A lightweight, rule-based temporal algorithm infers lane changes by tracking lane centroid shifts.
Audio Feedback: Text-to-speech generates synchronized verbal alerts for detected events.
Pipeline Architecture: Modular, sequential video processing stages generate annotated videos with audio alerts.
The system achieves high accuracy metrics (mAP > 97% for detection, IoU ~96% for segmentation) and low latency (~35–40 ms/frame), suitable for embedded automotive applications. It fills a research gap by combining unified multimodal perception with audio narration, enhancing driver assistance, autonomous vehicle safety, and accessibility.
Conclusion
Thisworkpresentsamodularandefficientautonomous driving assistant system that integrates YOLOv8-based object detectionandsegmentationmodelswithvoicealertmech- anismsusinggTTSandMoviePy.Thesystemeffectively handlesreal-timelanedetection,lanechangeprediction,traffic signrecognition,andspeedbreakeridentification,allwithin anoffline,command-lineenvironment.Themodulardesign ensures that each component functions independently yet col- laboratively, offeringflexibilityforupgradesandoptimization.Theresultsdemonstratethatdeeplearningandcomputervisiontechniquescanbeappliedinpractical,cost-effective waystoenhancedriverawarenessandroadsafety.High detection accuracy and real-time performance combined with intuitiveaudioalertsmakethesystemareliablefoundationforintelligenttransportationapplications.
For future work, we aim to extend support to live video streams and deploy the system on edge devices such as NVIDIA Jetson and Raspberry Pi. Additional improvements includeadoptingdeepreinforcementlearningforadaptivelane changelogic,expandingtraffic signcategories, andenhancing detection under low-light and adverse weather conditions.
References
[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only lookonce: Unified, real-time object detection,” Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 779–788,2016.
[2] G. e. a. Jocher, “Yolov8 by ultralytics.” https://github.com/ultralytics/ultralytics, 2023.Accessed: 2025-04-06.
[3] Ultralytics,“Yolov8-seg:Real-timesegmentationwithyolov8,”GitHubRepository, 2023.Accessed: 2025-04-06.
[4] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial as deep: Spatialcnn for traffic lane detection,” in Proceedings of the AAAI Conferenceon Artificial Intelligence, vol. 32, 2018.
[5] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer:Benchmarkingmachinelearningalgorithmsfortrafficsignrecognition,”Neural networks, vol. 32, pp. 323–332, 2012.
[6] P. Agrawal et al., “Speed breaker detection and alert system using gpsand accelerometer sensor,” International Journal of Computer Applica-tions, vol. 111, no. 14, 2015.
[7] A. Gupta et al., “Real-time speed breaker detection and alert systemusing convolutional neural networks,” International Conference on Ma-chine Learning and Computing (ICMLC), 2019.
[8] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicletrajectory prediction,” in Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition Workshops, pp. 1468–1476,2018.
[9] K. Chang, S. Li, Y. Xiang, Y. Lin, and B. Li, “Interpretable and rule-based lane change prediction for autonomous driving,” IEEE Transac-tions on Intelligent Vehicles, vol. 6, no. 3, pp. 471–482, 2021.
[10] D.Neven,B.DeBrabandere,S.Georgoulis,M.Proesmans,andL.VanGool,“Towardsendtoendlanedetection:aninstancesegmentationapproach,”2018IEEEIntelligentVehiclesSymposium(IV),pp.286–291, 2018.
[11] R. Bakht, S. S. Chowdhury, et al., “Voice-based navigation system forvisuallyimpaired peopleusing googlett sandgps,”in2020InternationalConference on Computing, Power and Communication Technologies(GUCON), pp. 291–296, IEEE, 2020.
[12] J. Shen, R. Pang, R. J. Weiss, et al., “Natural tts synthesis by condition-ing wavenet on mel spectrogram predictions,” 2018 IEEE InternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP),pp.4779–4783,2018.
[13] M. R. Bhuiyan, M. Rahman, et al., “Ai-powered assistance for thevisuallyimpairedusingobjectdetectionandtext-to-speech,”in20212ndInternational Conference on Robotics, Electrical and Signal ProcessingTechniques (ICREST), pp. 295–300, IEEE, 2021.