Real-time AI-powered smart glasses interface
OpenGlass connects Meta Ray-Ban smart glasses to Gemini Live and OpenClaw, turning them into a personal AI companion with eyes, ears, and hands. Stream video and audio in real-time, get intelligent responses, and execute actions across 56+ skills — all hands-free. Built as a native Swift iOS app with a flexible mode system for translation, QR scanning, object spotting, and more.
graph TD
A[Meta Ray-Ban Glasses] -->|Video + Audio| B[OpenGlass iOS App]
A2[iPhone Camera Fallback] -->|Video + Audio| B
B --> C[Vision Pipeline\n1fps JPEG frames]
B --> D[Audio Pipeline\n16kHz PCM]
B --> E[Mode Router]
C --> F[Gemini Live WebSocket]
D --> F
E --> F
F --> G[Audio Response → Speaker]
F --> H[Tool Calls → OpenClaw]
F --> I[Transcript → Screen]
H --> J[OpenClaw Gateway\n56+ Skills]
| Mode | Description |
|---|---|
| 🤖 Assistant | General-purpose AI with vision — describe scenes, read signs, remember context |
| 🌐 Translator | Real-time Mandarin ↔ English translation (voice + visual text) |
| 📱 QR Scanner | Detect and act on QR codes — open links, add contacts, trigger skills |
| 👁️ Spotter | Watch for specific objects/events and alert when spotted |
| 🏋️ Coach | Real-time visual coaching — gym form, cooking guidance, navigation cues, DIY |
| 🤝 Social | Discreet social context — read badges, spot logos, identify settings |
- Capture — Video frames from glasses (or iPhone camera) + microphone audio
- Stream — Frames throttled to 1fps JPEG, audio at 16kHz PCM, both sent to Gemini Live via WebSocket
- Process — Gemini analyses vision + audio, generates responses and tool calls
- Act — Audio responses play through the speaker; tool calls route to OpenClaw Gateway for execution
- Display — Live transcript shown on screen; results fed back into the conversation
- Xcode 15+ with Swift 5.9
- iPhone running iOS 17.0+ (iPhone 12 or later recommended)
- Gemini API key from Google AI Studio
- OpenClaw Gateway running on a Mac on the same LAN
- Meta Ray-Ban glasses (optional — iPhone camera works as fallback)
git clone https://github.com/DarlingtonDeveloper/OpenGlass.git
cd OpenGlass
open OpenGlass.xcodeproj # (when Xcode project is created)- Add your Gemini API key in Settings → API Key
- Ensure OpenClaw Gateway is running (
openclaw gateway status) - Connect to the same Wi-Fi network as your Mac
- (Optional) Pair Meta Ray-Ban glasses via Bluetooth
OpenGlass supports three ways to reach your OpenClaw Gateway:
- LAN — Direct local network connection (fastest, same Wi-Fi required)
- Tunnel — Via Cloudflare Tunnel (works from anywhere)
- Auto — Tries LAN first, falls back to tunnel (recommended)
See docs/SETUP.md for detailed instructions.
OpenGlass/
├── App/ # App entry point and root view
├── Config/ # Configuration management
├── Gemini/ # Gemini Live WebSocket, session, audio
├── Vision/ # Camera capture, frame throttling, QR detection
├── Modes/ # Mode protocol, router, and built-in modes
├── OpenClaw/ # Gateway bridge and tool call routing
├── UI/ # SwiftUI views
└── docs/ # Documentation
| Phase | Focus | Timeline |
|---|---|---|
| 1 | Foundation — app shell, camera, audio capture | Week 1-2 |
| 2 | Gemini Integration — WebSocket, streaming, tool calls | Week 3-4 |
| 3 | OpenClaw Bridge — gateway discovery, skill invocation | Week 5 |
| 4 | Mode System — all built-in modes, voice switching | Week 6-7 |
| 5 | Polish & Glasses — DAT SDK, glasses UI, optimization | Week 8-10 |
| 6 | Advanced Features — navigator, custom modes, widgets | Ongoing |
See SPEC.md for full specification.
- VisionClaw by Sean Liu — the inspiration and proof-of-concept for Gemini Live + OpenClaw on smart glasses
- Gemini Multimodal Live API — Google's real-time multimodal streaming API
- OpenClaw — AI agent gateway powering the 56+ skill integrations
- Meta DAT SDK — Direct Audio Transfer SDK for Ray-Ban Meta glasses
MIT © 2026 Mike Darlington