This project enables mouse control through hand gestures by processing Mediapipe and Hagrid models in parallel.
The system captures hand movements via video and translates them into corresponding mouse actions.
Key Components:
- Mediapipe: Detects finger joints and skeletal structure to track precise finger positions
- Hagrid (YOLOv10n): Recognizes various hand poses and gestures
- Integration: Visualizes hand movements using Mediapipe's coordinate data while determining mouse actions through YOLO-detected poses
- Camera: 6.16ms (Range: 5-8ms)
- Reasoning: 39.19ms (Range: 25-51ms)
- Visualization: 9.35ms (Range: 0-29ms)
Total Average Processing Time: 54.70ms
- Camera: 11.3% of total time
- Reasoning: 71.6% of total time
- Visualization: 17.1% of total time
The reasoning component dominates the processing pipeline, accounting for over 70% of the total execution time. Camera capture is highly consistent and efficient, while visualization time varies significantly (0-29ms) depending on whether rendering is needed.
-
Input Stage
- Camera frame acquisition captures video frames
-
Primary Processing Stage
- Acquired frames branch into three directions:
- MediaPipe processing (hand landmark detection)
- YOLO processing (object detection)
- Update visualize model (visualization model update)
- Acquired frames branch into three directions:
-
Parallel Processing Stage
- MediaPipe results branch into 2 parallel tasks:
- Control the mouse (mouse control)
- Visualize the boundbox (bounding box visualization)
- MediaPipe results branch into 2 parallel tasks:
-
Integration and Output Stage
- All parallel processing results are finally integrated into "update image"
- Processed results are displayed on screen
Key Features:
- Parallel Processing: MediaPipe and YOLO execute simultaneously for performance optimization
- Multi-tasking: Single MediaPipe result enables simultaneous mouse control and visualization
- Real-time Processing: All results are integrated into one image providing real-time feedback
- Point up: move the mouse cursor
- Fist: click the left mouse button
- Captures hand landmarks and gestures and visualizes the bounding box and finger joint points
- Tracks fingertip positions
Based on your gesture-based mouse control project documentation, here are the patch notes for version 1.1:
- Initial release of gesture-based mouse control system
- Implemented parallel processing of Mediapipe and Hagrid (YOLOv10n) models
- Basic hand gesture recognition for mouse control:
- Point up: Move mouse cursor
- Fist: Left mouse click on
- open palm : Left mouse click off
- Real-time hand landmark detection and visualization
- Integrated bounding box and finger joint visualization
- Average processing time: about 95.00ms per frame
- Enhanced Visualization System: Updated visualization pipeline for improved rendering efficiency
- Smart Object Detection Skipping: Added intelligent frame skipping for YOLO processing to reduce computational overhead
- YOLO Computation Optimization: Implemented advanced optimization techniques for YOLOv10m inference
- Frame Rate Improvements: Overall system optimization resulting in better frame rate performance
- Processing Pipeline Refinements: Optimized parallel processing workflow for reduced latency
Performance Impact: These optimizations significantly improve the real-time responsiveness of the gesture control system while maintaining accuracy in hand pose detection and mouse control functionality.
- Mediapipe - Hand landmark detection
- Hagrid YOLOv10n - Object detection backbone
