View Source Code
Browse the complete example on GitHub
A browser driving game you control with your hands and voice, powered by models running fully local.
Steer by holding both hands up like a steering wheel. Speak commands to accelerate, brake, toggle headlights, and play music. No cloud calls, no server round-trips. Everything runs in your browser tab.
How it works
Two models run in parallel, entirely client-side:- MediaPipe Hand Landmarker tracks your hand positions via webcam at ~30 fps. The angle between your two wrists drives the steering.
- LFM2.5-Audio-1.5B runs in a Web Worker with ONNX Runtime Web. It listens for speech via the Silero VAD and transcribes each utterance on-device. Matched keywords control game state.
Voice commands
| Say | Effect |
|---|---|
speed / fast / go | Accelerate to 120 km/h |
slow / stop / brake | Decelerate to 0 km/h |
lights on | Enable headlights |
lights off | Disable headlights |
music / play | Start the techno beat |
stop music / silence | Stop the beat |
Prerequisites
Browser Requirements
- Chrome 113+ or Edge 113+ (WebGPU required for fast audio inference; falls back to WASM)
- Webcam and microphone access
- Node.js 18+
Run locally
Architecture
requestAnimationFrame. Hand detection is throttled to ~30 fps so it does not block rendering. Voice processing happens off the main thread and delivers results via postMessage.
Need help?
Join our Discord
Connect with the community and ask questions about this example.