Skip to content

[RFC] Native Agent Swarm Architecture - Multi-Process Parallel Execution #1495

@Nimo1987

Description

@Nimo1987

Summary

This is a follow-up to #1493 (software-layer Agent Swarm). While #1493 provides immediate value, this RFC proposes a native architecture for true multi-agent parallel execution using process isolation and message passing.

Goal: Transform nanobot from single-agent to multi-agent orchestrator without breaking existing skills.

Motivation

Current limitations of single-agent architecture:

  • No true parallelism: All tool calls block each other
  • Context pollution: One complex task crowds out previous context
  • No fault isolation: One failed tool crashes entire session
  • Limited scalability: Can't distribute workloads

Proposed Architecture

┌─────────────────────────────────────────────────────────────┐
│                    nanobot (Orchestrator)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Planner    │  │  Message Bus │  │  Supervisor  │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Agent Worker │    │ Agent Worker │    │ Agent Worker │
│  (Research)  │    │  (Architect) │    │  (Analyst)   │
│  ┌──────────┐│    │  ┌──────────┐│    │  ┌──────────┐│
│  │Skill Env ││    │  │Skill Env ││    │  │Skill Env ││
│  │(isolated)││    │  │(isolated)││    │  │(isolated)││
│  └──────────┘│    │  └──────────┘│    │  └──────────┘│
└──────────────┘    └──────────────┘    └──────────────┘

Core Components

1. Agent Worker Process (Docker/Subprocess)

# Each agent runs in isolated environment
class AgentWorker:
    def __init__(self, role: str, skills: List[str]):
        self.container = DockerContainer()  # or subprocess
        self.memory = AgentMemory()  # isolated from orchestrator
        self.tool_registry = ToolRegistry(sks)
    
    async def execute(self, task: Task) -> Result:
        # Runs independently, reports via message bus
        pass

2. Message Bus (IPC)

# Async message passing between agents
class MessageBus:
    async def publish(self, channel: str, message: Message)
    async def subscribe(self, channel: str) -> AsyncIterator[Message]
    
# Supported backends: Redis (production) / SQLite (local) / In-memory (dev)

3. Task Queue & Scheduler

# SQLite-based queue (no external deps for v1)
class TaskQueue:
    def enqueue(self, task: Task, priority: int)
    def dequeue(self, agent_roles: List[str]) -> Optional[Task]
    def complete(self, task_id: str, result: Result)

4. Blackboard Pattern

Shared workspace for agent collaboration:

/blackboard/{session_id}/
  ├── context.md      # Shared context
  ├── findings/       # Researcher deposits here
  ├── designs/        # Architect deposits here
  ├── critiques/      # Critic deposits here
  └── final/          # Coordinator synthesizes here

Execution Flow

User Request
    ↓
[Orchestrator] Decompose into subtasks
    ↓
[Task Queue] Assign to agents
    ↓
[Agent A] ──┐
[Agent B] ──┼──► [Blackboard] ◄── [Orchestrator] ──► Final Response
[Agent C] ──┘         ↑
              (continuous updates)

Implementation Phases

Phase 1: Subprocess Workers (MVP)

  • No Docker dependency
  • Python multiprocessing for isolation
  • SQLite message queue
  • Timeline: 2-3 weeks

Phase 2: Docker Isolation

  • Each agent in container
  • Shared volume for blackboard
  • Redis message bus option
  • Timeline: 4-6 weeks

Phase 3: Distributed (Future)

  • Multiple machines
  • Kubernetes orchestration
  • Timeline: Not in scope

Backward Compatibility

# Existing skills work unchanged
class LegacySkill:
    def execute(self, **kwargs):  # Still works
        pass

# New swarm-aware skills can opt-in
class SwarmSkill:
    async def execute(self, context: SwarmContext):
        # Can access other agents, publish to blackboard
        pass

Trade-offs

Aspect Software Layer (#1493) Native Architecture (this)
Complexity Low High
Parallelism Simulated Real
Isolation None Process/Container
Latency Single call Coordination overhead
Reliability Single point Fault tolerant
Maintenance Easy Harder

Recommendation

  1. Merge [Feature Request] Introduce Agent Swarm Collaboration Mode for Complex Tasks #1493 first (software layer) - gives immediate value
  2. Run beta for 1-2 months - validate demand
  3. If high usage: Implement Phase 1 (subprocess)
  4. If enterprise demand: Phase 2 (Docker)

Open Questions

  1. Should we use existing solutions (Celery, Ray) or build minimal?
  2. How to handle skill dependency isolation (different Python versions)?
  3. What's the max agent count before coordination overhead dominates?

Labels

rfc, architecture, enhancement, help wanted


Related: #1493 (software-layer alternative), YourBot multi-bot collaboration

Credit: Inspired by TinyClaw's TUI dashboard, NanoClaw's container security, and OpenAI's Swarm framework.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions