FieldVision

AI-Powered Industrial Safety Assistant

FieldVision is a real-time AI copilot for industrial maintenance technicians. Built on the Google Agent Development Kit (ADK) with Gemini's bidi-streaming Live API, it provides hands-free voice interaction, continuous visual safety monitoring, and automated compliance logging.

Features

Real-time video analysis — Continuous monitoring for safety hazards, PPE compliance, and procedure verification
Hands-free voice interface — Full two-way audio conversation using Gemini Live API with bidi-streaming
Multi-turn text and audio Q&A — Ask multiple questions via text or voice within the same session; conversation history is maintained across turns
Automated reporting — HTML session reports with AI executive summaries and audit logs; PDF work-order reports for a date range
Manager dashboard — Live video feeds from all active technician sessions for supervisors
Work order management — End-to-end workflow: create (via voice), approve, and complete maintenance work orders
Session resumption — "New Topic" feature allows seamless context switching without reloading
Technical manual integration — Cached manuals/safety_manual.md is loaded at agent startup via app/manual_loader.py and injected into the agent's system instruction for grounded Q&A
Badge verification — After a voice work-order request, the agent asks the user to show their ID badge; verify_badge reads name/ID/department from the video and checks users.json to create or escalate the work order (app/fieldvision_agent/tools.py)
Evidence capture — For safety events with severity >= 4, a JPEG frame is saved under static/evidence/ and linked in the audit (app/fieldvision_agent/tools.py)
Work order reports — Managers/supervisors can download a PDF work-orders report for a date range via GET /api/reports/work-orders?start=...&end=... (main.py, app/report_generator.py)
Site-wide summary — GET /api/reports/site-wide-summary?hours=24 returns session counts, hazard counts, and active zones for dashboards (main.py)
Role-based login — JWT-based auth with technician/manager/supervisor roles and permissions in users.json
AI tool calling — Automated safety event logging, work order creation, and badge verification via function calling

Design Decisions

Single Conversation per Session

FieldVision enforces a "one conversation per session" model. Each new safety session or topic change initiates a fresh conversation context.

Why? This ensures a clean state for every interaction, preventing context pollution from previous tasks.
Benefit: guarantees predictable AI behavior and accurate reporting for each distinct safety value, which is critical for compliance and demo purposes.

Architecture

┌─────────────────┐     WebSocket      ┌─────────────────┐    ADK Runner     ┌─────────────────┐
│                 │ ◄─────────────────► │                 │ ◄───────────────► │                 │
│     Browser     │  Audio/Video/Text   │  FastAPI Server │  LiveRequestQueue │  Gemini Live    │
│   (Camera/Mic)  │                     │   + ADK Agent   │  Bidi-Streaming   │      API        │
│                 │ ◄─────────────────► │                 │ ◄───────────────► │                 │
└─────────────────┘   AI Responses      └─────────────────┘  Audio + Tools    └─────────────────┘
                                                │
                                                ▼
                                        ┌─────────────────┐
                                        │  audit_log.json │
                                        │  (Compliance)   │
                                        └─────────────────┘

How the app works

Authentication — Users log in at /login. POST /api/login returns a JWT. The token is used for REST (Authorization: Bearer <token>) and for the WebSocket connection (/ws?token=<token>). Roles and permissions are defined in users.json.

Technician flow — Open / (main app). Start a session; manual context from safety_manual.md is preloaded. The browser sends PCM audio (16 kHz) and JPEG video frames over WebSocket. The server uses Google ADK with Gemini Live (bidi-streaming). The agent can call tools: log_safety_event (writes to per-session audit logs and conversation log), create_work_order (stores a pending order in session state), and verify_badge (creates or escalates the work order in app/work_orders.py). "New topic" starts a new conversation context without reloading.

Manager / supervisor flow — Open /manager. The dashboard lists active technician camera feeds; the latest frame per technician is served via GET /api/camera-feeds/{user_id}/frame. Work orders (pending, approved, completed) are listed; supervisors can approve or complete orders. PDF work-orders report and site-wide summary are available for reporting.

Reporting — Session-level: GET /api/reports/{session_id} returns an HTML report with event timeline and AI-generated executive summary (app/reporting.py). Work-order compliance: GET /api/reports/work-orders?start=&end= returns a PDF (app/report_generator.py).

flowchart LR
  Login[Login] --> TechUI[Technician UI]
  Login --> MgrUI[Manager UI]
  TechUI <--> WS[WebSocket]
  WS <--> ADK[ADK / Gemini]
  ADK --> Tools[Tools]
  MgrUI --> REST[REST APIs]

Quick Start

Prerequisites

Python 3.11+
Google Cloud account with Gemini API access
Webcam and microphone

Installation

# Clone the repository
git clone https://github.com/your-org/field-vision.git
cd field-vision

# Create virtual environment
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Activate (Unix/macOS)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Configuration

Get your Gemini API Key from Google AI Studio
Create a .env file in the project root. You can copy from the template:

cp .env.example .env

Then set your API key in .env:

GEMINI_API_KEY=your_gemini_api_key_here

Optional: set HOST, PORT, SESSION_TTL_SECONDS, FRAME_RATE, JPEG_QUALITY, LOG_LEVEL, AUDIT_LOG_PATH, GEMINI_MODEL, INPUT_SAMPLE_RATE, OUTPUT_SAMPLE_RATE, MAX_RESUME_ATTEMPTS, or DEBUG as needed (see Configuration options below).

Running the Application

# Start the server
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Open your browser to: http://localhost:8000

Demo Credentials

Use these credentials to test different roles:

Role	Username	Password	Permissions
Technician	`tech_042`	`field123`	Basic access, voice Q&A
Sr. Technician	`tech_078`	`field456`	+ Create work orders
Supervisor	`sup_007`	`super789`	+ Approve work orders, view zone feeds
Manager	`mgr_001`	`manage101`	Full system access, all feeds, reports

Project Structure

field-vision/
├── app/
│   ├── __init__.py
│   ├── config.py               # Pydantic settings
│   ├── audit.py                # Safety event logging
│   ├── auth.py                 # JWT auth, roles, permissions
│   ├── websocket_handler.py    # WebSocket ↔ ADK bidi-streaming bridge
│   ├── gemini_service.py       # ADK Runner, session mgmt, RunConfig
│   ├── fieldvision_agent/      # ADK Agent definition + tools
│   │   ├── agent.py
│   │   └── tools.py
│   ├── manual_loader.py        # Load/cache safety manual for grounding
│   ├── conversation_logger.py # Session transcripts
│   ├── work_orders.py         # Create, approve, complete work orders
│   ├── reporting.py           # HTML session reports
│   └── report_generator.py    # PDF work-orders report
├── static/
│   ├── index.html              # Main technician UI
│   ├── login.html              # Login page
│   ├── manager.html            # Manager dashboard
│   ├── app.js                  # Frontend application
│   ├── pcm-processor.js        # AudioWorklet for mic PCM streaming
│   ├── badges/                 # Badge images (e.g. for demo)
│   └── evidence/              # Created at runtime for captured frames
├── manuals/
│   └── safety_manual.md        # Technical manual for grounded Q&A
├── logs/                       # Audit and transcript logs
├── main.py                     # FastAPI application
├── requirements.txt
├── users.json                  # User credentials and permissions
├── pending_orders.json         # Work orders awaiting approval
├── approved_orders.json
├── completed_orders.json
├── .env.example              # Environment template (copy to .env)
├── ROADMAP.md
├── tests/
└── README.md

API Reference

WebSocket

Connect with ?token=<jwt> (e.g. ws://localhost:8000/ws?token=...). The JWT is obtained from POST /api/login.

Client → Server

Type	Payload	Description
`start_session`	`{ manual_context?: string }`	Start a new AI session
`end_session`	`{}`	End current session
`audio_data`	`{ data: base64 }`	PCM16 audio at 16kHz
`video_frame`	`{ data: base64 }`	JPEG image frame
`text_message`	`{ text: string }`	Text input

Server → Client

Type	Payload	Description
`session_started`	`{ session_id: string }`	Session confirmation
`audio_response`	`{ data: base64 }`	PCM16 audio at 24kHz
`text_response`	`{ text: string }`	Text response
`tool_call`	`{ function: string, arguments: object }`	Safety event logged
`error`	`{ error: string }`	Error message

REST Endpoints

Method	Path	Description
`GET`	`/`	Main application UI
`GET`	`/login`	Login page
`GET`	`/manager`	Manager dashboard
`GET`	`/health`	Health check
`POST`	`/api/login`	Login (body: `user_id`, `password`); returns JWT and user info
`GET`	`/api/me`	Current user info (auth required)
`GET`	`/api/session/{id}/summary`	Session audit summary
`GET`	`/api/session/{id}/events`	Session event list
`GET`	`/api/audit/logs`	List all historical sessions
`GET`	`/api/reports/{session_id}`	Generate HTML session report
`GET`	`/api/reports/work-orders`	PDF work-orders report (query: `start`, `end` ISO dates; auth required)
`GET`	`/api/reports/site-wide-summary`	Site-wide activity summary (query: `hours`, default 24)
`GET`	`/api/camera-feeds`	List active technician camera feeds (auth required)
`GET`	`/api/camera-feeds/{user_id}/frame`	Get latest video frame for a technician (auth required)
`GET`	`/api/work-orders`	List work orders (filtered by role)
`POST`	`/api/work-orders/{order_id}/approve`	Approve a pending work order
`POST`	`/api/work-orders/{order_id}/complete`	Mark a work order as completed

Safety Event Types

Event Type	Description	Severity Range
`missing_ppe`	PPE not detected (gloves, glasses, etc.)	3-5
`hazard_detected`	General safety hazard identified	2-5
`unsafe_position`	Body in dangerous position	4-5
`procedure_violation`	Incorrect procedure step	3-4
`equipment_issue`	Equipment problem detected	2-5
`environment_hazard`	Spill, obstruction, etc.	2-5
`step_verified`	Procedure step confirmed correct	1
`safety_check_passed`	Safety inspection passed	1

Configuration Options

Variable	Default	Description
`GEMINI_API_KEY`	required	Google Gemini API key
`GEMINI_MODEL`	(see config)	Optional override for Gemini model
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`8000`	Server port
`DEBUG`	`false`	Debug mode
`SESSION_TTL_SECONDS`	`3600`	Session timeout
`MAX_RESUME_ATTEMPTS`	`3`	Max session resume attempts
`INPUT_SAMPLE_RATE`	`16000`	Input audio sample rate (Hz)
`OUTPUT_SAMPLE_RATE`	`24000`	Output audio sample rate (Hz)
`FRAME_RATE`	`1`	Video capture FPS
`JPEG_QUALITY`	`85`	Image compression quality
`LOG_LEVEL`	`INFO`	Logging verbosity
`AUDIT_LOG_PATH`	`./logs/audit_log.json`	Audit log file path

Safety Governance

FieldVision adheres to responsible AI principles:

Transparency - All observations are logged and auditable
AI Disclosure - Responses explicitly labeled as AI-generated
Advisory Only - No direct machine control; humans perform all actions
Human-in-the-Loop - Safety sign-offs require human approval
Accountability - Each session has a clearly defined owner

Future Roadmap

Multi-turn text & audio Q&A with conversation history
PDF-ready HTML compliance report generation
Role-based authentication
AR glasses integration for hands-free HUD
Multi-step LOTO sequence verification
IoT sensor integration
Cloud Run deployment for fleet scaling
Firestore for persistent audit storage

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer: FieldVision is an advisory system only. It does NOT control industrial equipment and should NOT be used as a primary safety mechanism. Always follow your organization's safety protocols and procedures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FieldVision

Features

Design Decisions

Single Conversation per Session

Architecture

How the app works

Quick Start

Prerequisites

Installation

Configuration

Running the Application

Demo Credentials

Project Structure

API Reference

WebSocket

Client → Server

Server → Client

REST Endpoints

Safety Event Types

Configuration Options

Safety Governance

Future Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
app		app
manuals		manuals
static		static
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ROADMAP.md		ROADMAP.md
approved_orders.json		approved_orders.json
completed_orders.json		completed_orders.json
main.py		main.py
pending_orders.json		pending_orders.json
requirements.txt		requirements.txt
test_config.py		test_config.py
users.json		users.json
work_orders.json		work_orders.json

Folders and files

Latest commit

History

Repository files navigation

FieldVision

Features

Design Decisions

Single Conversation per Session

Architecture

How the app works

Quick Start

Prerequisites

Installation

Configuration

Running the Application

Demo Credentials

Project Structure

API Reference

WebSocket

Client → Server

Server → Client

REST Endpoints

Safety Event Types

Configuration Options

Safety Governance

Future Roadmap

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages