The Fully Local, Zero-Interaction Autonomous Agent.
Ambient AI is a fully local autonomous agent designed to ambient around you. Unlike traditional assistants that wait for commands, Ambient AI actively listens to your conversations and watches your screen to understand your intent and context in real-time.
Its core philosophy is Zero-Interaction, but it goes beyond simple automation. Ambient AI acts as a proactive partner:
- Contextual Research: Catches topics you discuss in passing, autonomously performs deep web research, and presents answers before you even ask.
- Intelligent Planning: Generates daily to-do lists based on your previous days' context and historical conversations.
- Seamless Delegation: Assign complex workflows by adding them to a "Ambient AI Tasks" project in Todoist β the agent picks these up and executes them during idle time.
The project follows the Hexagonal Architecture (Ports & Adapters) pattern, cleanly separating core logic from external dependencies.
src/
βββ core/ # Domain models & pure logic (NO outward deps)
β βββ models.py # Domain objects (ChatMessage, NightTask, etc.)
β βββ services/
β βββ merge_transcript.py # Pure transcription merging logic
βββ application/ # Depends on core only
β βββ ports/ # Abstract interfaces (contracts)
β β βββ LLMProvider.py # LLM generation & streaming
β β βββ modelManager.py # Model loading/unloading
β β βββ tool_bridge_port.py # MCP tool system
β β βββ notification_port.py # System notifications
β β βββ task_queue_port.py # Night task queue
β β βββ task_provider_port.py# External task providers
β β βββ asr_port.py # Speech recognition
β β βββ identity_port.py # Speaker identification
β β βββ voice_repository_port.py # Voice embedding store
β βββ services/ # Orchestration (depends on ports + core)
β βββ llm_interaction_service.py # Streaming chat loop + tool execution
β βββ night_mode_service.py # Autonomous night-time processing
βββ infrastructure/
β βββ adapter/ # Concrete implementations
β βββ llamaCppAdapter.py # llama.cpp OpenAI-compatible server
β βββ openVinoAdapter.py # OpenVINO GenAI local inference
β βββ MCPToolAdapter.py # MCP bridge wrapper
β βββ SQLiteNotificationAdapter.py
β βββ SQLiteTaskQueueAdapter.py
β βββ TodoistTaskAdapter.py
β βββ ASR_Adapter.py # Whisper transcription
β βββ pyannoteAdapter.py # Speaker diarization
β βββ ecapaVoxcelebAdapter.py # Voice identification
β βββ SQLiteVoiceAdapter.py # Voice embedding storage
βββ app.py # Composition root (entry point)
βββ server.py # FastAPI audio streaming server
βββ mcp.json # MCP server configuration
- Hinglish Transcription: Fine-tuned Whisper model optimized for code-mixed audio β Hindi2Hinglish (Oriserve)
- Speaker Diarization: Real-time VAD and speaker separation using Pyannote
- Voice Identity: Voice embedding creation using SpeechBrain VoxCeleb to distinguish the user from guests
- Local Inference: Powered by Qwen 3 VL 4B (default) via llama.cpp
- Notification System: State-aware feedback loop that updates the LLM on outcomes of previous tasks
- Chat Mode: Direct interactive interface with the local LLM
- Night Mode: Fully autonomous task processing during idle hours
Custom MCP Bridge enabling unlimited extensibility:
| Category | Tool | Description |
|---|---|---|
| Productivity | β Todoist | Extracts and adds tasks from audio context |
| π Google Meet | Creates meetings from conversation details | |
| π Obsidian | Manages your personal knowledge base | |
| Research | π Tavily | Autonomous deep web search |
| πΈοΈ Web Browsing | Full page navigation and extraction | |
| System | π Custom MCP | Bridge any MCP server (Filesystem, GitHub, etc.) |
| π Live Dashboard | Visualizes agent activity in real-time | |
| ποΈ FastAPI Server | Stream audio notes for transcription |
- Python 3.11+
- llama.cpp server β running locally with OpenAI-compatible API (default:
http://localhost:8080) - Node.js / npm β required for MCP server tooling (
npx) - CUDA-capable GPU β recommended for model inference
# Clone the repository
git clone https://github.com/your-username/ambient-ai.git
cd ambient-ai
# Create and activate virtual environment
python -m venv venv
# Windows
./venv/Scripts/activate.ps1
# Linux / macOS
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt-
LLM Server β Start llama.cpp server with the model preset manager or manually:
# The app expects the server at http://localhost:8080 # It will auto-load the model "Qwen3-VL-4b-Instruct-Q4_K_M" on startup
-
MCP Tools β Configure your MCP servers in
mcp.json:{ "mcpServers": { "My MCP Server": { "command": "fastmcp", "args": ["run", "path/to/MCP_tools.py"], "env": { "TODOIST_API_TOKEN": "your-token" } } } } -
Todoist (optional) β Set your API token in the
TODOIST_API_TOKENenvironment variable and configuretodoist.jsonwith your project ID. -
Night Queue Database β Auto-initializes on first run. To manually initialize:
python src/night_mode.py
Ambient AI supports two inference backends. Set the LLM_BACKEND environment variable to choose:
| Backend | Value | Description |
|---|---|---|
| llama.cpp (default) | llamacpp |
Uses an external llama.cpp server with OpenAI-compatible API. Best for CUDA GPUs. |
| OpenVINO | openvino |
Uses Intel's OpenVINO GenAI runtime. Best for Intel GPUs, CPUs, and NPUs. No separate server needed. |
The default backend. Just start the llama.cpp server and run the app:
python src/app.py-
Install the runtime:
pip install openvino-genai
-
Prepare a model β You need an OpenVINO-optimized model (IR format).
-
Set environment variables and run:
# Windows (PowerShell) $env:LLM_BACKEND = "openvino" $env:OPENVINO_MODEL_PATH = "path/to/your/openvino-model-dir" $env:OPENVINO_DEVICE = "GPU" # Options: CPU, GPU, NPU python src/app.py # Linux / macOS LLM_BACKEND=openvino OPENVINO_MODEL_PATH=path/to/model OPENVINO_DEVICE=GPU python src/app.py
| Variable | Default | Description |
|---|---|---|
LLM_BACKEND |
llamacpp |
llamacpp or openvino |
OPENVINO_MODEL_PATH |
Qwen3-4B-int4-ov |
Path to the OpenVINO model directory |
OPENVINO_DEVICE |
GPU |
Target device: CPU, GPU, or NPU |
# Main application (all modes)
python src/app.py
# FastAPI audio streaming server
python src/main.pyOn launch, you'll be presented with three modes:
| Mode | Description |
|---|---|
| 1 β User Interaction | Interactive chat with the LLM, with full tool access |
| 2 β Transcription Automation | Processes .txt files in transcriptions/ through the LLM |
| 3 β Night Mode | Autonomous processing of queued tasks, Todoist tasks, and notifications |
output_github_ready.mp4
- Hexagonal Architecture Refactor: Codebase refactored using Ports and Adapters pattern
- GUI Agent: Vision-based agent for direct screen control (in progress)
- Context Fusion: Merging audio and screenshot context streams
- Model Fine-tuning: Fine-tuning Qwen 3 for Ambient AI's autonomous workflows
This project is licensed under the MIT License β see the LICENSE file for details.