Iris — Autonomous Browser Agent

Iris is an autonomous browser system that can perceive, reason, and act on web pages — similar to how a human interacts with a browser.

It combines browser automation, structured page understanding, and LLM-based decision making to execute tasks step-by-step.

Features

Connects to a real browser using Playwright (CDP)
Extracts structured interactive elements from the DOM
Uses visual context (screenshots) for reasoning
Executes actions: click, type, scroll, navigate
Runs an agent loop: observe → think → act
Streams live browser screen using noVNC

How It Works

1. Perception Layer

Captures browser screenshots
Extracts interactive elements (buttons, inputs, links)
Builds a structured representation of the page

2. Reasoning Layer

Sends context (DOM + screenshot) to an LLM
Decides the next best action

3. Action Layer

Executes actions using Playwright:
- click
- type
- scroll
- navigation

4. Feedback Loop

Observe → Understand → Decide → Act → Repeat

Architecture (V1)

User
↓
Backend (FastAPI)
↓
Playwright Browser (via CDP)
↓
DOM Extraction + Screenshot
↓
LLM (Vertex AI - Gemini Flash)
↓
Action Execution
↓
noVNC (Live Screen Streaming)

Current Limitations (V1)

Single VM deployment (shared session across users)
No per-user isolation
Heavy reliance on vision (screenshots → higher cost)
Uses coordinate-based clicking (can be fragile)

Roadmap (V2)

Per-user isolated browser sessions
Containerized execution (Docker)
Scalable architecture (multi-instance support)
Voice input for natural interaction
Hybrid reasoning (DOM-first, vision fallback)
Reduced LLM cost via selective screenshot usage

Design Insights

Instead of raw pixel-based control, Iris builds a structured interaction map
LLM is used for decision making, not raw extraction
System follows a closed-loop agent architecture
Combines symbolic (DOM) + perceptual (vision) inputs

Tech Stack

Backend: FastAPI, Python
Browser Automation: Playwright (CDP)
LLM: Vertex AI (Gemini Flash)
Streaming: noVNC
Infra: Google Cloud VM (Compute Engine)

Future Direction

Iris is evolving toward a multi-user, scalable autonomous agent system, with:

distributed browser sessions
intelligent routing
efficient multimodal reasoning

Contributing

Open to ideas, improvements, and collaborations.

Note

This is an early version (V1) focused on validating the core idea of autonomous browser interaction.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iris — Autonomous Browser Agent

Features

How It Works

1. Perception Layer

2. Reasoning Layer

3. Action Layer

4. Feedback Loop

Architecture (V1)

Current Limitations (V1)

Roadmap (V2)

Design Insights

Tech Stack

Future Direction

Contributing

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Iris — Autonomous Browser Agent

Features

How It Works

1. Perception Layer

2. Reasoning Layer

3. Action Layer

4. Feedback Loop

Architecture (V1)

Current Limitations (V1)

Roadmap (V2)

Design Insights

Tech Stack

Future Direction

Contributing

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages