Smart Voice and Gesture Controlled ML-Driven Wheelchair Assistance for Differently Abled Individuals

Authors: Prasad Jadhav, Tapan Chauhan, Varsha Devakathe, Saniya Jadhav

DOI Link: https://doi.org/10.22214/ijraset.2026.80646

Abstract

Mobility impairment poses one of the most significant challenges for differently abled individuals, directly impacting their independence, confidence, and quality of life. Traditional wheelchairs—whether manual or joystick-operated—often fail to meet the diverse needs of users with severe physical limitations such as paralysis, neuromuscular disorders, or limb amputations. This paper presents a Smart Voice and Gesture Controlled ML-Driven Wheelchair Assistance system that empowers users to navigate hands-free through voice commands and hand gestures, leveraging Machine Learning (ML) for accurate, real-time motion responses. The hardware framework integrates a Raspberry Pi 4 (main controller), ESP32 microcontroller, MPU6050 accelerometer/gyroscope for gesture detection, HC-SR04 ultrasonic sensor for obstacle detection, and L293D motor driver. On the software side, RNN/LSTM-based models process speech features (MFCCs) while CNN/MediaPipe models recognize hand gestures. A Command Fusion Engine arbitrates between modalities, enforces a safety layer, and executes motor commands. Experimental evaluations project ?85% command recognition accuracy with sub-200ms response latency, offering a cost-effective, offline-capable, and inclusive assistive mobility solution.

Introduction

This project presents an AI- and Machine Learning-based smart wheelchair designed to improve mobility for individuals with severe physical disabilities who cannot effectively use conventional joystick-controlled wheelchairs. The system enables hands-free control through voice commands and hand gestures, making mobility more accessible and user-friendly.

The wheelchair integrates speech recognition, gesture recognition, machine learning, IoT, and obstacle detection into a single embedded platform. Voice commands are processed using offline speech recognition with multilingual support, while gestures are recognized using an MPU6050 sensor or a camera with MediaPipe. A Command Fusion Engine prioritizes inputs and resolves conflicts, while HC-SR04 ultrasonic sensors automatically stop the wheelchair when obstacles are detected within 30 cm, ensuring user safety.

The system follows a structured methodology involving data collection, preprocessing, model training, real-time deployment on a Raspberry Pi and ESP32, and continuous improvement through user feedback. It is expected to achieve over 85% voice recognition accuracy, over 87% gesture recognition accuracy, less than 200 ms response time, and reliable offline operation.

Compared to existing smart wheelchairs, the proposed system offers multimodal control, offline machine learning, obstacle avoidance, IoT-based caregiver monitoring, and lower cost, making it more suitable for users in developing countries.

Future enhancements include LiDAR and GPS-based outdoor navigation, advanced AI models, computer vision with facial gesture recognition, mobile applications for caregiver monitoring, and clinical trials to improve real-world usability and move toward commercial medical device deployment.

Conclusion

Mobility is a fundamental aspect of independent living. For differently abled individuals—particularly those with paralysis, muscular dystrophy, or neuromuscular disorders—daily movement becomes severely constrained, affecting psychological well-being and social participation. Conventional wheelchairs, whether manual or joystick-controlled, demand a level of dexterity that many such users simply cannot achieve. The emergence of Machine Learning (ML), Internet of Things (IoT), and Human-Computer Interaction (HCI) technologies has opened transformative possibilities for assistive devices that can interpret natural human inputs in real time. This paper presents a Smart Voice and Gesture Controlled ML-Driven Wheelchair Assistance system—a hands-free mobility solution that integrates speech recognition and gesture detection within a single embedded platform. An ML-based Command Fusion Engine arbitrates between input modalities, ensuring robustness even when one channel (e.g., voice in noisy environments) underperforms. Ultrasonic sensors provide an automatic safety layer, overriding motion commands when obstacles are detected within 30 cm. A. Problem Statement Existing control interfaces—joysticks, push buttons—are non-adaptive and demand fine motor control unavailable to many users with severe disability. Commercial smart wheelchairs, while innovative, are prohibitively expensive for most patients in developing economies. There is a clear, unmet need for a cost-effective, multimodal, and intelligent wheelchair system capable of interpreting voice and gesture inputs in real time, with offline operation and built-in collision safety. B. Scope The system is designed primarily for indoor environments. Core scope includes: hybrid voice+gesture control, ML-based intent recognition, obstacle detection, offline processing, and IoT-enabled caregiver monitoring. Future extensions include GPS navigation, LiDAR-based outdoor traversal, and voice-assistant integration. II. LITERATURE REVIEW Multidisciplinary research in robotics, edge-ML, and sensor fusion has rapidly advanced assistive wheelchair technology. Systematic reviews and empirical prototypes converge on several key findings reviewed below. A. Multimodal Control Systems [1] Kim et al. (2023) surveyed smart wheelchair modalities including joystick augmentation, voice control, gesture recognition, and brain-computer interfaces (BCI). The study emphasises user-centered design and documents performance trade-offs between onboard and cloud computation, strongly recommending hybrid modality systems for resilience. B. IoT-Enabled Gesture Control [2] Sadi et al. (2022) demonstrated a wearable finger-gesture pipeline with IoT-based fall detection and remote alerts on Raspberry Pi. Their lightweight footprint and networked safety alerts form a direct precedent for the safety and monitoring features of the proposed system. C. Accelerometer-Based Gesture [3] Multiple studies using wearable IMUs (MPU6050) classify directional gestures (forward, left, right, stop) via SVM or k-NN on time-series features with latency under 150ms. Per-user calibration is identified as a key challenge for real-world deployment—addressed in this work through an incremental learning feedback loop. D. Voice-Activated Control [4] Lightweight speech models on Raspberry Pi demonstrate intuitive control but sensitivity to noise and accent variation. Prototypes combining offline speech models with obstacle detection report significant safety improveements. The proposed system adopts offline SpeechRecognition with MFCC preprocessing to mitigate noise sensitivity. E. Obstacle Avoidance [5] Ertürk et al. and Haddad et al. validate reactive navigation with ultrasonic and RGB-D sensors for indoor wheelchair use. Dynamic window algorithms achieve real-time collision avoidance on constrained hardware, establishing the baseline for the HC-SR04 integration in this system. F. Summary • Multimodal control (voice + gesture) is recommended to mitigate single-modality failures. • Wearable IMUs (MPU6050) offer low-cost, low-latency gesture recognition. • Edge deployment (Raspberry Pi, ESP32) with lightweight ML models is viable for real-time control. • Obstacle detection is an essential concurrent safety layer, not an optional add-on. III. OBJECTIVES 1) Design a hybrid voice-and-gesture control mechanism for flexible, user-friendly wheelchair operation. 2) Implement an AI-based speech recognition module supporting multi-accent and multilingual commands. 3) Develop a CNN/MediaPipe-based gesture recognition system for accurate, real-time hand movement interpretation. 4) Integrate HC-SR04 ultrasonic and IR sensors for real-time obstacle detection and collision avoidance. 5) Train an ML model that learns user movement patterns to predict optimal navigation paths. 6) Design an IoT-enabled system for remote caregiver monitoring and diagnostics. IV. PROPOSED SYSTEM The proposed framework is an AI- and ML-driven Smart Wheelchair Assistance system offering hands-free control through voice and gesture inputs. Figure 1 presents the complete system architecture. The design is stratified into three layers: Input, Processing, and Output. A. Voice Recognition Module Users issue movement commands—\"Move forward\", \"Turn left\", \"Stop\", \"Go back\"—captured by a USB microphone. The SpeechRecognition library with Google\'s offline model converts raw audio to text. MFCCs are extracted as feature vectors, which are fed to a trained RNN/LSTM model. Multilingual support (English and Hindi) is achieved via language-model fine-tuning. Offline operation ensures full functionality without internet connectivity. B. Gesture Recognition Module An MPU6050 IMU mounted on the user\'s wrist captures 6-axis acceleration and gyroscope data at 100 Hz via the ESP32 microcontroller. Sliding-window feature extraction (mean, variance, peak-to-peak, zero-crossing rate) feeds a DNN classifier. Alternatively, a camera and MediaPipe Hands framework enable vision-based recognition, mapping specific hand poses to motion commands. Each gesture maps to one action: wrist-tilt right ? turn right, palm-raise ? stop, etc. C. Obstacle Detection and Navigation The HC-SR04 ultrasonic sensor triggers timed pulses at 40 kHz, measuring distance by echo return time. When an object is detected within 30 cm in the forward direction, the system automatically halts forward motion and activates a buzzer alert, regardless of the current voice or gesture command. This safety override operates at the hardware interrupt level, ensuring sub-10ms response time. The ML model additionally processes historical sensor data to predict safe navigation corridors. D. Command Fusion Engine The Raspberry Pi 4 hosts the Command Fusion Engine—a priority-based arbitration module that evaluates confidence scores from the speech and gesture models and resolves conflicts. The priority hierarchy is: (1) Emergency stop / obstacle avoidance, (2) Gesture command, (3) Voice command. An adaptive feedback mechanism records user corrections to continuously retrain the models for personalised accuracy. Fig. 1: System Architecture – Smart Wheelchair Assistance Framework V. METHODOLOGY The system development follows a five-phase iterative methodology: Phase 1 – Data Collection Voice samples in multiple accents and languages (English, Hindi) are recorded for commands: start, stop, left, right, backward. MPU6050 gesture sequences are captured under varied lighting and backgrounds. HC-SR04 sensor readings are logged across indoor environments with varying terrain. Phase 2 – Data Preprocessing Voice: raw audio is converted to 13-coefficient MFCCs using librosa. Gesture: IMU time series are normalised, segmented into 1-second windows, and augmented (Gaussian noise, time-warp). Labels are assigned according to the commanded action. Phase 3 – Model Training Speech model: 3-layer LSTM (128 ? 64 ? 32 units) + dense softmax, trained with Adam optimiser (lr=0.001), batch=32, 50 epochs. Gesture model: 2D CNN on spectrogram features or MediaPipe landmark vectors fed into a DNN. Train/validation/test split: 70/15/15. TensorFlow/Keras on Google Colab (GPU). Phase 4 – Real-Time Implementation Trained .tflite models are deployed on Raspberry Pi 4. The ESP32 streams MPU6050 data over Wi-Fi (TCP socket) at 100 Hz. The Raspberry Pi processes inputs, runs inference, and dispatches GPIO signals to the L293D motor driver. The ultrasonic sensor interrupt runs in a dedicated thread to guarantee the safety override at all times. Phase 5 – Feedback and Optimisation Command-response pairs are logged. Users can flag misclassifications via a short spoken correction, triggering incremental retraining. A Flask-based IoT dashboard streams battery level, GPS coordinates (future), and command history to a caregiver\'s mobile browser. VI. HARDWARE & SOFTWARE SPECIFICATIONS Component Specification Raspberry Pi 4B 4GB RAM, quad-core ARM ESP32 Dual-core 240 MHz, Wi-Fi/BT MPU6050 6-axis IMU, I2C HC-SR04 Ultrasonic, 2–400 cm L293D H-bridge, 600mA/channel DC Motors 12V, 150 RPM Battery 12V 5Ah LiPo pack Table I: Hardware Specifications Software: Python 3.12, TensorFlow 2.x/Keras, Scikit-learn, librosa, OpenCV, MediaPipe, SpeechRecognition, pyttsx3, Flask, Arduino IDE (C/C++ for ESP32). VII. EXPECTED RESULTS The system is projected to achieve the following performance targets in controlled indoor testing: • Voice command recognition accuracy ? 85% across English and Hindi accents. • Gesture recognition accuracy ? 87% using MPU6050 + DNN in standard indoor conditions. • End-to-end command latency < 200 ms from input capture to motor actuation. • Obstacle detection response time < 10 ms (hardware interrupt level). • Offline operation sustained indefinitely without network dependency. Table II: Comparison with Related Works Feature Sadi [2] Reddy [4] Proposed Voice Control No Yes Yes Gesture Control Yes Yes Yes Offline ML No No Yes Obstacle Avoid. No No Yes IoT Dashboard Yes No Yes Multimodal Fusion No No Yes Accuracy 82% 84% ?85% Latency ~300ms ~250ms <200ms VIII. FUTURE SCOPE While the current system is optimised for controlled indoor environments, several enhancements are planned to extend its capabilities and commercial viability: A. Advanced AI Models Hybrid CNN+LSTM architectures will be explored for gesture prediction, leveraging temporal dependencies across multi-joint wrist motion sequences. Transfer learning from pre-trained gesture datasets (e.g., 20BN-Jester) will reduce per-user training time and improve generalisation across disability profiles. B. Computer Vision Integration A dedicated camera module will enable MediaPipe Holistic-based tracking of full hand landmarks plus facial micro-expressions (eye blink, jaw movement) as additional control inputs—critical for users with no upper-limb mobility, such as those with high-level spinal cord injuries. C. Outdoor Navigation LiDAR-based 3D mapping (e.g., RPLiDAR A1) and GPS integration will extend obstacle avoidance to outdoor, uneven terrains. The dynamic window algorithm will be replaced with a learned navigation policy (DQN or PPO) trained in simulation and transferred to hardware. D. Caregiver Mobile Application A React Native companion app will provide real-time wheelchair telemetry (battery, GPS, speed), voice/gesture command log, and remote emergency brake capability over MQTT. Push notifications will alert caregivers on fall-detection events triggered by the MPU6050\'s accelerometer threshold. E. Clinical Trials Pilot usability studies with target user populations (spinal cord injury, ALS, cerebral palsy) are planned to evaluate system ergonomics, recognition fatigue, and real-world effectiveness—bridging the gap between laboratory prototypes and certified medical devices.

References

[1] Y. Kim et al., \"A Literature Review on Smart Wheelchair Systems,\" Assistive Technology Journal, 2023. [2] M.S. Sadi et al., \"Finger-Gesture Controlled Wheelchair with IoT,\" IEEE Access, 2022. [3] R. Patel and D. Mehta, \"Design and Implementation of Gesture Controlled Wheelchair Using Accelerometer Sensor,\" IJAREEIE, vol. 13, no. 2, 2024. [4] K. Reddy and V. Sharma, \"Voice and Gesture Controlled Wheelchair Using ML,\" IJETER, vol. 8, no. 10, 2023. [5] S. Ahmed and M. Rahman, \"Deep Learning for Hand Gesture Recognition in Assistive Mobility,\" IEEE Access, vol. 10, 2022. [6] S. Kumar and N. Joshi, \"Smart Wheelchair Using Ultrasonic Sensors and ML,\" Int. J. Control and Automation, vol. 15, no. 6, 2022. [7] A. Singh and P. Kaur, \"AI and IoT Enabled Smart Wheelchair,\" J. Intelligent Systems and Robotics, vol. 9, no. 4, 2023.

Copyright

Copyright © 2026 Prasad Jadhav, Tapan Chauhan, Varsha Devakathe, Saniya Jadhav. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET80646

Publish Date : 2026-04-21

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here