Real-Time Sign Language Detection and Captioning for Video Chatting
An AI-powered system enabling real-time sign language recognition and captioning for accessible video communication.
This project introduces a real-time sign language detection and captioning system designed for video conferencing applications. By integrating AI-driven gesture recognition with WebRTC-based video calls, it enhances communication between sign language users and non-signers. The system leverages MediaPipe for hand landmark detection and a Random Forest Classifier for gesture classification, ensuring accurate and low-latency caption generation.
Tech Stack
PythonFlaskOpenCVMediaPipeRandom Forest ClassifierNode.jsExpress.jsWebRTCSocket.IO
Features
- Real-time sign language recognition using MediaPipe
- Caption generation integrated directly into video calls
- Low-latency WebRTC-based video and audio communication
- Custom dataset for improved model accuracy and robustness
- Scalable architecture supporting dynamic gesture recognition
Challenges
- Maintaining real-time performance without introducing video delays
- Handling variations in gestures due to lighting, angles, and regional differences
- Building a balanced and diverse dataset for robust classification
- Seamless integration of gesture recognition with ideo conferencing interfaces
Learnings
- Developed custom datasets with preprocessing and augmentation techniques
- Enhanced understanding of computer vision pipelines for gesture recognition
- Gained experience in integrating AI models into real-time applications
- Learned to optimize WebRTC for low-latency ideo streaming