Skip to content

KarSri7694/Ambient-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

156 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌌 Ambient AI

The Fully Local, Zero-Interaction Autonomous Agent.


πŸ“– Overview

Ambient AI is a fully local autonomous agent designed to ambient around you. Unlike traditional assistants that wait for commands, Ambient AI actively listens to your conversations and watches your screen to understand your intent and context in real-time.

Its core philosophy is Zero-Interaction, but it goes beyond simple automation. Ambient AI acts as a proactive partner:

  • Contextual Research: Catches topics you discuss in passing, autonomously performs deep web research, and presents answers before you even ask.
  • Intelligent Planning: Generates daily to-do lists based on your previous days' context and historical conversations.
  • Seamless Delegation: Assign complex workflows by adding them to a "Ambient AI Tasks" project in Todoist β€” the agent picks these up and executes them during idle time.

πŸ—οΈ Architecture

The project follows the Hexagonal Architecture (Ports & Adapters) pattern, cleanly separating core logic from external dependencies.

src/
β”œβ”€β”€ core/                        # Domain models & pure logic (NO outward deps)
β”‚   β”œβ”€β”€ models.py                # Domain objects (ChatMessage, NightTask, etc.)
β”‚   └── services/
β”‚       └── merge_transcript.py          # Pure transcription merging logic
β”œβ”€β”€ application/                 # Depends on core only
β”‚   β”œβ”€β”€ ports/                   # Abstract interfaces (contracts)
β”‚   β”‚   β”œβ”€β”€ LLMProvider.py       # LLM generation & streaming
β”‚   β”‚   β”œβ”€β”€ modelManager.py      # Model loading/unloading
β”‚   β”‚   β”œβ”€β”€ tool_bridge_port.py  # MCP tool system
β”‚   β”‚   β”œβ”€β”€ notification_port.py # System notifications
β”‚   β”‚   β”œβ”€β”€ task_queue_port.py   # Night task queue
β”‚   β”‚   β”œβ”€β”€ task_provider_port.py# External task providers
β”‚   β”‚   β”œβ”€β”€ asr_port.py          # Speech recognition
β”‚   β”‚   β”œβ”€β”€ identity_port.py     # Speaker identification
β”‚   β”‚   └── voice_repository_port.py # Voice embedding store
β”‚   └── services/                # Orchestration (depends on ports + core)
β”‚       β”œβ”€β”€ llm_interaction_service.py   # Streaming chat loop + tool execution
β”‚       └── night_mode_service.py        # Autonomous night-time processing
β”œβ”€β”€ infrastructure/
β”‚   └── adapter/                 # Concrete implementations
β”‚       β”œβ”€β”€ llamaCppAdapter.py   # llama.cpp OpenAI-compatible server
β”‚       β”œβ”€β”€ openVinoAdapter.py   # OpenVINO GenAI local inference
β”‚       β”œβ”€β”€ MCPToolAdapter.py    # MCP bridge wrapper
β”‚       β”œβ”€β”€ SQLiteNotificationAdapter.py
β”‚       β”œβ”€β”€ SQLiteTaskQueueAdapter.py
β”‚       β”œβ”€β”€ TodoistTaskAdapter.py
β”‚       β”œβ”€β”€ ASR_Adapter.py       # Whisper transcription
β”‚       β”œβ”€β”€ pyannoteAdapter.py   # Speaker diarization
β”‚       β”œβ”€β”€ ecapaVoxcelebAdapter.py # Voice identification
β”‚       └── SQLiteVoiceAdapter.py   # Voice embedding storage
β”œβ”€β”€ app.py                       # Composition root (entry point)
β”œβ”€β”€ server.py                    # FastAPI audio streaming server
└── mcp.json                     # MCP server configuration

πŸš€ Key Capabilities

πŸ‘‚ The "Ear" (Audio Intelligence)

  • Hinglish Transcription: Fine-tuned Whisper model optimized for code-mixed audio β€” Hindi2Hinglish (Oriserve)
  • Speaker Diarization: Real-time VAD and speaker separation using Pyannote
  • Voice Identity: Voice embedding creation using SpeechBrain VoxCeleb to distinguish the user from guests

🧠 The "Brain" (Reasoning & Control)

  • Local Inference: Powered by Qwen 3 VL 4B (default) via llama.cpp
  • Notification System: State-aware feedback loop that updates the LLM on outcomes of previous tasks
  • Chat Mode: Direct interactive interface with the local LLM
  • Night Mode: Fully autonomous task processing during idle hours

πŸ› οΈ The "Hands" (Tool Ecosystem)

Custom MCP Bridge enabling unlimited extensibility:

Category Tool Description
Productivity βœ… Todoist Extracts and adds tasks from audio context
πŸ“… Google Meet Creates meetings from conversation details
πŸ““ Obsidian Manages your personal knowledge base
Research 🌐 Tavily Autonomous deep web search
πŸ•ΈοΈ Web Browsing Full page navigation and extraction
System πŸ”Œ Custom MCP Bridge any MCP server (Filesystem, GitHub, etc.)
πŸ“Š Live Dashboard Visualizes agent activity in real-time
πŸŽ™οΈ FastAPI Server Stream audio notes for transcription

⚑ Getting Started

Prerequisites

  • Python 3.11+
  • llama.cpp server β€” running locally with OpenAI-compatible API (default: http://localhost:8080)
  • Node.js / npm β€” required for MCP server tooling (npx)
  • CUDA-capable GPU β€” recommended for model inference

Installation

# Clone the repository
git clone https://github.com/your-username/ambient-ai.git
cd ambient-ai

# Create and activate virtual environment
python -m venv venv

# Windows
./venv/Scripts/activate.ps1

# Linux / macOS
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Configuration

  1. LLM Server β€” Start llama.cpp server with the model preset manager or manually:

    # The app expects the server at http://localhost:8080
    # It will auto-load the model "Qwen3-VL-4b-Instruct-Q4_K_M" on startup
  2. MCP Tools β€” Configure your MCP servers in mcp.json:

    {
      "mcpServers": {
        "My MCP Server": {
          "command": "fastmcp",
          "args": ["run", "path/to/MCP_tools.py"],
          "env": { "TODOIST_API_TOKEN": "your-token" }
        }
      }
    }
  3. Todoist (optional) β€” Set your API token in the TODOIST_API_TOKEN environment variable and configure todoist.json with your project ID.

  4. Night Queue Database β€” Auto-initializes on first run. To manually initialize:

    python src/night_mode.py

Backend Selection

Ambient AI supports two inference backends. Set the LLM_BACKEND environment variable to choose:

Backend Value Description
llama.cpp (default) llamacpp Uses an external llama.cpp server with OpenAI-compatible API. Best for CUDA GPUs.
OpenVINO openvino Uses Intel's OpenVINO GenAI runtime. Best for Intel GPUs, CPUs, and NPUs. No separate server needed.

llama.cpp (default β€” no extra config needed)

The default backend. Just start the llama.cpp server and run the app:

python src/app.py

OpenVINO

  1. Install the runtime:

    pip install openvino-genai
  2. Prepare a model β€” You need an OpenVINO-optimized model (IR format).

  3. Set environment variables and run:

    # Windows (PowerShell)
    $env:LLM_BACKEND = "openvino"
    $env:OPENVINO_MODEL_PATH = "path/to/your/openvino-model-dir"
    $env:OPENVINO_DEVICE = "GPU"   # Options: CPU, GPU, NPU
    python src/app.py
    
    # Linux / macOS
    LLM_BACKEND=openvino OPENVINO_MODEL_PATH=path/to/model OPENVINO_DEVICE=GPU python src/app.py
Variable Default Description
LLM_BACKEND llamacpp llamacpp or openvino
OPENVINO_MODEL_PATH Qwen3-4B-int4-ov Path to the OpenVINO model directory
OPENVINO_DEVICE GPU Target device: CPU, GPU, or NPU

Running

# Main application (all modes)
python src/app.py

# FastAPI audio streaming server
python src/main.py

On launch, you'll be presented with three modes:

Mode Description
1 β€” User Interaction Interactive chat with the LLM, with full tool access
2 β€” Transcription Automation Processes .txt files in transcriptions/ through the LLM
3 β€” Night Mode Autonomous processing of queued tasks, Todoist tasks, and notifications

πŸŽ₯Demo Video

output_github_ready.mp4

πŸ—ΊοΈ Future Roadmap

  • Hexagonal Architecture Refactor: Codebase refactored using Ports and Adapters pattern
  • GUI Agent: Vision-based agent for direct screen control (in progress)
  • Context Fusion: Merging audio and screenshot context streams
  • Model Fine-tuning: Fine-tuning Qwen 3 for Ambient AI's autonomous workflows

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.

About

Ambient AI: A Fully Local, Zero-Interaction Autonomous Agent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors