A JavaScript example app that runs YOLOv9 real-time object detection on live webcam video entirely in the browser using Transformers.js and WebAssembly — no server GPU required. Detected snapshots and bounding box metadata are automatically captured and stored in Backblaze B2 cloud storage.
Detect people, vehicles, animals, and 80 COCO object classes from your webcam in real time. Configure auto-capture intervals (5 seconds to 5 minutes) to continuously monitor a scene and save annotated snapshots with detection metadata (labels, confidence scores, bounding boxes) to S3-compatible Backblaze B2 object storage.
- Pet monitoring — Watch your pets while you're away with automatic capture
- Home and backyard security — Detect people, vehicles, and animals in outdoor areas
- Wildlife camera — Capture and log wildlife activity with configurable intervals
- Prototype and demo — Build browser-based computer vision apps without provisioning GPU infrastructure
- Transformers.js — Run Hugging Face AI models like YOLOv9 in the browser with WebAssembly
- YOLOv9 — State-of-the-art real-time object detection model (COCO-trained, 80 classes)
- Backblaze B2 — S3-compatible cloud object storage at $6/TB/month
- Client-side object detection: Run YOLOv9 entirely in the browser — no server GPU required
- Auto-capture monitoring: Configurable intervals (5s to 5min) for continuous scene monitoring
- Live detection feed: Real-time bounding box overlay with an animated grid of captured snapshots
- Cost-effective cloud storage: Store snapshots and detection JSON in Backblaze B2
- Secure direct uploads: Browser-to-cloud uploads using S3 pre-signed URLs
User Camera → Browser (Transformers.js + YOLOv9) → Real-time Detection
↓
Auto-Capture (configurable interval)
↓
Snapshot PNG → B2 Storage
Detection JSON → B2 Storage
↓
Live Feed Grid (animated, clickable)
- User clicks "Start Camera" to enable webcam
- Browser loads YOLOv9 model (Xenova/gelan-c_all)
- Video frames are processed in real-time for object detection
- User selects capture interval and clicks "Start Auto-Capture"
- At each interval:
- Snapshot is captured and uploaded to B2
- Detection data (labels, bounding boxes, confidence) saved to B2
- New capture animates into the detection feed grid
- Click any feed item to view enlarged with B2 links
- Node.js 18+
- Backblaze B2 Account (free tier available)
- Create a bucket
- Generate an Application Key with
readFiles,writeFiles,writeBucketspermissions
git clone https://github.com/backblaze-b2-samples/b2-transformers-video-object-detection.git
cd b2-transformers-video-object-detection/backend
npm installcp .env.example .envEdit .env with your B2 credentials:
B2_ENDPOINT=https://s3.us-west-002.backblazeb2.com
B2_REGION=us-west-002
B2_KEY_ID=your_key_id_here
B2_APP_KEY=your_app_key_here
B2_BUCKET=your-bucket-nameGet your B2 endpoint and region from your bucket details page
npm startThat's it! The server automatically:
- Configures B2 CORS for browser uploads
- Serves both frontend and API
- Opens at
http://localhost:3000
- Open http://localhost:3000 in your browser
- Click "Start Camera" and allow camera access
- Adjust detection threshold and image size as needed
- Select a capture interval (5s, 10s, 30s, 1min, 5min)
- Click "Start Auto-Capture" to begin monitoring
- Watch the Detection Feed populate with snapshots
- Click any feed item to view details and B2 links
First run downloads the YOLOv9 model (~50MB) - this may take a minute
If auto-setup fails (missing permissions), run manually:
npm run setup-corsRequired B2 Key Permissions:
listBucketsreadFileswriteFileswriteBucketSettings(required for CORS setup)
- Open the frontend in your browser
- Click "Start Camera" to enable webcam
- Objects are detected in real-time and shown with bounding boxes
- Adjust controls:
- Threshold: Minimum confidence score (0.01 - 1.0)
- Image Size: Processing resolution (64 - 256)
- Capture Interval: Time between auto-captures (5s - 5min)
- Click "Start Auto-Capture" to begin continuous monitoring
- The Detection Feed shows captured snapshots with:
- Timestamp
- Number of objects detected
- Object labels (person, dog, car, etc.)
- Click any feed item to enlarge and access B2 storage links
- Use "Capture Now" for manual one-off snapshots
Railway / Render / Fly.io:
- Set environment variables from
.env - Deploy
backend/directory - Update frontend
apiUrlto deployed URL
Docker:
docker-compose up -dStatic Hosting (Netlify, Vercel, Cloudflare Pages):
- Deploy
frontend/directory - Set API URL in settings or hardcode in
index.html
B2 Static Hosting:
- Upload
frontend/index.htmlto B2 bucket - Enable website hosting on bucket
- Access via B2 website URL
Request:
{
"contentType": "image/png"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "snapshots/uuid.png",
"fileId": "uuid"
}Request:
{
"fileId": "uuid"
}Response:
{
"uploadUrl": "https://...",
"publicUrl": "https://...",
"key": "detections/uuid.json"
}This example uses Xenova/gelan-c_all, a YOLOv9-based object detection model quantized for browser inference via Transformers.js and WebAssembly. It detects 80 COCO object classes (person, car, dog, cat, bicycle, bird, etc.) with real-time bounding boxes and confidence scores.
- Model: Xenova/gelan-c_all — YOLOv9-based, COCO-trained
- Library: Transformers.js — Run Hugging Face transformer models in the browser
- Size: ~50MB download (cached in browser after first load)
- Classes: 80 COCO classes — person, car, dog, cat, truck, bicycle, bird, and more
- Output: Bounding boxes, class labels, and confidence scores per frame
- Provider: Backblaze B2
- API: S3-compatible API with pre-signed URLs
- Pricing: $6/TB/month storage, uploads are FREE
- Stored data: Annotated PNG snapshots + JSON detection metadata (labels, bounding boxes, confidence)
- Chrome 90+
- Edge 90+
- Firefox 90+
- Safari 15.4+
Requires WebAssembly, ES6 modules, and getUserMedia support.
- First run loads model (~50MB, one-time download)
- Higher image sizes increase accuracy but reduce FPS
- Requires camera permissions
- Browser must stay open during detection
- Motion-triggered capture (only capture when objects detected)
- Alert notifications when specific objects detected (e.g., person)
- Video file upload (not just webcam)
- Record video clips with detections overlay
- Multiple model options (faster/slower)
- Object tracking across frames
- Export feed history to CSV/JSON
- Filter feed by detected object type
- Transformers.js Documentation — Run Hugging Face AI models in the browser with WebAssembly
- Transformers.js GitHub — Source code and examples
- YOLOv9 Paper — Original research: "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information"
- COCO Dataset — Common Objects in Context — the 80-class dataset YOLOv9 is trained on
- Backblaze B2 Documentation — Cloud storage API docs
- B2 S3-Compatible API — Use standard S3 SDKs with Backblaze B2
Problem: Browser shows CORS error when uploading snapshot.
Solution:
- Run
npm run setup-corsin the backend directory - Or manually configure CORS on your B2 bucket
- Verify CORS is set: Go to B2 Console > Your Bucket > Settings > CORS Rules
Problem: Browser won't access camera.
Solution:
- Click the camera icon in your browser's address bar
- Allow camera permissions for localhost
- Ensure no other application is using the camera
- Try a different browser
Problem: "Error loading model" message.
Solution:
- Check internet connection (model downloads from Hugging Face)
- Clear browser cache and reload
- Try incognito/private mode
- Check browser console for specific errors
Problem: Detection is laggy or slow.
Solution:
- Reduce "Image Size" slider (try 64 or 96)
- Reduce "Video Scale" slider (try 0.25 or 0.3)
- Close other browser tabs/applications
- Use Chrome for best WebAssembly performance
This project is licensed under the MIT License. See the LICENSE file for details.