Exhibited at the Art Gallery of Ontario, March 2025
HEADS UP: This project is in constant iteration. Mainly for the CVZone-Python side. Latest py iteration includes a gesture rule classifier snippet that switches from one text-prompt to another inside TD. (TouchDesigner file update not included yet).
The Microscopic is a real-time generative AI system exhibited as a public video mapping installation. It merges multimodal user input with diffusion-based image generation to create immersive, reactive visuals. The project was designed to test how human gestures can condition and control AI imagery in live settings, using a custom pipeline that combines gesture recognition with a diffusion model conditioning.
- Explore gesture-based control as a creative input for AI visual systems.
- Investigate image conditioning using predefined animations and structure-aware prompts.
- Demonstrate live integration between user input and Stable Diffusion using the TouchDiffusion plugin for TouchDesigner.
This system implements a rule-based classifier using cvzone.HandTrackingModule
, OpenCV
, and autopy
to translate webcam-captured hand landmarks into real-time interaction modes:
- Virtual Mouse – Controls the mouse using the index fingertip position.
- Zoom Mode – Recognizes two-hand gestures to scale and reposition images.
- cvzone
- OpenCV
- autopy
- socket (for UDP communication with TouchDesigner)
- Smoothed pointer movement using interpolation.
- click recognition based on hand pose (index + middle finger clse together).
- Zoom level control by measuring distance between both hands.
- UDP data transmission of gestures states and zoom values to TouchDesigner.
See code example in VirtualMouse_GestureControl_v02.py
(add this as a file to your repo).
We used TouchDiffusion, a real-time implementation of Stable Diffusion in TouchDesigner.
- Noise Map: The default randomness input.
- Author-Controlled RGB Animation: A high contrast particle-based animation was used as a second conditioning map. This helped the model "preserve the structure" while allowing creative variation.
- Gesture Input: Gestures sent via UDp dynamically transformed or influenced the diffusion parameters during runtime.
This conditioning approach allowed the diffusion model to maintain coherence with the structured reference (e.g., particle animations) while introducing stylistic variation bassed on the noise.
While The Microscopic was designed as a standalone installation, its architecture opens doors to future uses in:
- Theater and Live Performance (gesture-driven control of VFX, lighting, sound)
- Prototype testing for multimodal AI interaction
- Creative gaming and memory-based physical interaction systems
The combination of authored animations, structured prompts, and reactive gesture input makes this system ideal for immersive, performative applications!
- TouchDiffusion by @olegchomp
- cvzone
- OpenCV, autopy, and TouchDesigner community
- Paketa12 on Youtube
- I'm running python version 3.8.0 because Mediapipe didn't seem to be running on later versions (at least for me).
- I'm using
TouchDiffusion
main version. I've tried to install the portable version but it simply won't run. Don't get discourage if either versions don't work properly at first. Reinstalling the main version seemed to work for me. - The TouchDesigner file uploaded contains esential nodes for decoding coming data from OSC. However, I keep working on more recent versions for research purposes.