Real-Time Sign Language Detection and Captioning for Video Chatting

An AI-powered system enabling real-time sign language recognition and captioning for accessible video communication.

This project introduces a real-time sign language detection and captioning system designed for video conferencing applications. By integrating AI-driven gesture recognition with WebRTC-based video calls, it enhances communication between sign language users and non-signers. The system leverages MediaPipe for hand landmark detection and a Random Forest Classifier for gesture classification, ensuring accurate and low-latency caption generation.

Tech Stack

PythonFlaskOpenCVMediaPipeRandom Forest ClassifierNode.jsExpress.jsWebRTCSocket.IO

Features

  • Real-time sign language recognition using MediaPipe
  • Caption generation integrated directly into video calls
  • Low-latency WebRTC-based video and audio communication
  • Custom dataset for improved model accuracy and robustness
  • Scalable architecture supporting dynamic gesture recognition

Challenges

  • Maintaining real-time performance without introducing video delays
  • Handling variations in gestures due to lighting, angles, and regional differences
  • Building a balanced and diverse dataset for robust classification
  • Seamless integration of gesture recognition with ideo conferencing interfaces

Learnings

  • Developed custom datasets with preprocessing and augmentation techniques
  • Enhanced understanding of computer vision pipelines for gesture recognition
  • Gained experience in integrating AI models into real-time applications
  • Learned to optimize WebRTC for low-latency ideo streaming