Skip to content

[RFC] Python Agent Service for OpenSearch #20602

@mingshl

Description

@mingshl

Overview

This RFC proposes a Python-based agent service that serves as a unified backend for AI-powered assistants in OpenSearch Dashboards. The service uses Strands SDK as the orchestration framework and follows the multi-agent pattern, where a top-level orchestrator routes user requests to specialized sub-agents based on the Dashboard page context.

All sub-agents share a common tool layer built on the OpenSearch MCP Server, which already provides comprehensive OpenSearch operations (index listing, search, mappings, cluster health, shards, explain, multi-search, and a generic API tool). A General Assistant sub-agent (OpenSearch Agent) powered by these MCP tools serves as the default fallback for pages without a dedicated specialist, while domain-specific sub-agents (e.g., Agentic Relevance Tuning for the Search Relevance page) extend the shared MCP tools with additional domain-specific tools. This enables incremental adoption — pages without a dedicated sub-agent get a capable general assistant, while specialized pages benefit from purpose-built agents.

Problem Statement

1. Development friction with Java-based agents

The current OpenSearch Agent Framework requires agent development in Java with REST API integration. While this works for core OpenSearch contributors, it creates significant friction for:

  • AI/ML engineers and data scientists who predominantly work in Python and are the primary audience for building intelligent agents.
  • Rapid prototyping — Java's compile-deploy cycle is slow compared to Python's dynamic development workflow.
  • LLM ecosystem access — The most mature agent frameworks (Strands, LangChain, CrewAI), tool libraries, and LLM SDKs are Python-first.
  • Completing resources -- The current OpenSearch agent framework are running within a open search cluster and the assistant agent will compete resources with the main opensearch requests.

2. Monolithic agent limitations

The current architecture couples all agent capabilities into a single framework. This makes it difficult to:

  • Use different LLM models per task (e.g., a fast/cheap model for simple analytics, a powerful model for complex reasoning).
  • Isolate failures — a bug in one agent capability affects all others.
  • Scale development — multiple teams cannot independently develop and deploy agents for different domains.

3. Underutilized Dashboard page context

OpenSearch Dashboards has distinct functional pages (Index Management, Search Relevance, Anomaly Detection, Observability, etc.), each with domain-specific user intents. The current chat assistant does not leverage this page context to provide specialized help. A user on the Search Relevance page asking "help me improve results for 'laptop'" should get a fundamentally different experience than a user on the Index Management page asking "help me manage my indices."

4. Community growth opportunity

The OpenSearch project aims to attract new contributor groups. A Python agent service lowers the barrier for data scientists, AI engineers, and the broader Python community to contribute agent capabilities to OpenSearch without needing Java expertise or modifications to OpenSearch core.

Motivation

This RFC generalizes the ART architecture into a reusable framework that:

  • Allows any Dashboard page to have a dedicated AI assistant.
  • Reuses the OpenSearch MCP Server as a shared tool layer across all agents.
  • Provides a capable General Assistant as the default fallback, powered by MCP tools.
  • Provides a standardized pattern for contributing new Python agents.
  • Leverages a single deployment (one service, one endpoint) for all agents.

Goals

  1. Create a Python agent service that integrates with OpenSearch Dashboards chat plugin via the AG-UI protocol.
  2. Implement a Strands-based orchestrator that routes requests to the appropriate sub-agent based on page context.
  3. Reuse the OpenSearch MCP Server as the shared tool layer for all agents to interact with OpenSearch.
  4. Provide a General Assistant sub-agent (powered by MCP tools) as the default fallback for pages without a dedicated specialist.
  5. Establish ART as the first specialized sub-agent, serving the Search Relevance Workbench page.
  6. Define a clear pattern and contribution guide for adding new sub-agents.
  7. Support multiple LLM providers (AWS Bedrock, OpenAI, Ollama) through Strands SDK's model abstraction.

Proposed Solution

Design Overview

Image

Routing Mechanism

The orchestrator uses a two-tier routing strategy: deterministic page-based routing first, LLM-based routing as supplement.

Tier 1 — Deterministic routing (page context):

The Dashboard chat plugin injects the current page context into each request. The orchestrator uses a simple registry to map page identifiers to sub-agents:

# Agent registry: page_context → sub-agent
AGENT_REGISTRY = {
    "search-relevance": art_agent,          # ART handles Search Relevance pages
    # "anomaly-detection": anomaly_agent,   # Future
    # "index-management": index_mgmt_agent, # Future
}

def route_request(page_context: str, message: str) -> Agent:
    """Route to specialized agent or fallback to General Assistant."""
    if page_context in AGENT_REGISTRY:
        return AGENT_REGISTRY[page_context]
    return general_assistant  # Fallback: MCP-powered general agent

Tier 2 — LLM-based routing (supplementary):

When page context alone is insufficient (e.g., a cross-domain question asked from any page), the orchestrator LLM can analyze the user's intent and delegate to the appropriate sub-agent. The orchestrator uses a lightweight model (e.g., Claude Haiku) for routing decisions to minimize latency and cost.

OpenSearch MCP Server as Shared Tool Layer

The OpenSearch MCP Server is a key building block of this design. It already provides a comprehensive set of OpenSearch tools via the Model Context Protocol:

Category Tools Description
Core ListIndexTool, SearchIndexTool, IndexMappingTool, ClusterHealthTool, CountTool, ExplainTool, MsearchTool, GetShardsTool Essential OpenSearch operations
Generic GenericOpenSearchApiTool Flexible tool that can call any OpenSearch API endpoint
Skills DataDistributionTool, LogPatternAnalysisTool Higher-level analytical capabilities
Advanced GetClusterStateTool, CatNodesTool, GetNodesTool, GetIndexStatsTool, GetQueryInsightsTool, etc. Deep cluster inspection

The MCP server supports stdio and streaming (SSE/HTTP) transports, header-based auth (for credential pass-through from Dashboard), multi-cluster mode, and configurable tool filtering.

All sub-agents share the same MCP connection to OpenSearch. This means:

  • No duplicated OpenSearch client code across agents.
  • New tools added to the MCP server are immediately available to all agents.
  • Authentication and connection management are handled in one place.

General Assistant as Default Fallback

Instead of wrapping the existing Java agent via REST API, the fallback is a native Python agent powered directly by the OpenSearch MCP Server tools:

from strands import Agent

# MCP tools are loaded once at startup and shared across all agents
# opensearch_mcp_tools = [ListIndexTool, SearchIndexTool, ClusterHealthTool, ...]

general_assistant = Agent(
    model=haiku_model,  # Lightweight model for general queries
    system_prompt="""You are a general-purpose OpenSearch assistant.
    You help users explore their cluster, understand indices, query data,
    check cluster health, and answer questions about their OpenSearch
    environment. Use the available tools to interact with OpenSearch.""",
    tools=opensearch_mcp_tools,  # All MCP tools available
)

This approach has significant advantages over wrapping the Java agent:

  • No double LLM invocation — the Java agent would invoke its own LLM; the Python General Assistant uses the MCP tools directly, requiring only one LLM call.
  • No dependency on ml-commons agent framework — works with any OpenSearch cluster, even without ml-commons installed.
  • Consistent architecture — the fallback is a regular Strands agent like all other sub-agents, not a special wrapper.
  • Same MCP tools as specialized agents — the General Assistant and specialized agents share the same tool layer; the difference is their system prompts and any additional domain-specific tools.
  • Extensible — as the MCP server gains new tools (e.g., GetQueryInsightsTool, LogPatternAnalysisTool), the General Assistant automatically benefits.

How Specialized Agents Extend the Shared Tools

Specialized sub-agents receive the shared MCP tools plus their own domain-specific tools:

# ART agent = shared MCP tools + domain-specific SRW tools
art_agent = Agent(
    model=sonnet_model,
    system_prompt=ART_SYSTEM_PROMPT,
    tools=[
        *opensearch_mcp_tools,           # Shared: ListIndex, Search, etc.
        *art_specific_tools,             # Domain: SRW configs, judgments,
                                         #   experiments, UBI analytics
    ],
)

# General Assistant = shared MCP tools only
general_assistant = Agent(
    model=haiku_model,
    system_prompt=GENERAL_ASSISTANT_PROMPT,
    tools=opensearch_mcp_tools,          # Shared tools only
)

This layered tool approach means:

  • Adding a new sub-agent only requires defining a system prompt and any domain-specific tools.
  • Shared tools are never duplicated — they come from the single MCP server connection.
  • Tool filtering can be configured per agent via the MCP server's OPENSEARCH_TOOL_FILTER or YAML config.

ART as First Specialized Sub-Agent

The Agentic Relevance Tuning (ART) system serves as the reference implementation for the first specialized sub-agent. ART is activated when the user is on the Search Relevance Workbench page and provides:

  • User Behavior Analysis — Analyzes UBI data for CTR, engagement patterns, zero-click rates.
  • Hypothesis Generation — Diagnoses search issues and generates improvement hypotheses with automated sanity checks via pairwise experiments.
  • Offline Evaluation — Runs pointwise experiments with judgment lists and calculates standard metrics (NDCG@K, MAP, Precision@K).
  • Online Testing (planned) — Interleaved A/B testing for production validation.

ART already implements the "Agents as Tools" pattern with Strands SDK, connects to OpenSearch via MCP server, and communicates with the Dashboard via AG-UI protocol. The work required is to extract the orchestration and routing layer into the shared framework that other sub-agents can plug into.

AG-UI Protocol Integration

The service communicates with the OpenSearch Dashboards chat plugin via the AG-UI protocol, which provides:

  • Streaming responses — Real-time token-by-token output via Server-Sent Events (SSE).
  • Tool call visualization — Frontend renders tool calls and results as the agent works.
  • State management — Thread-based conversation persistence.
  • Generative UI — Agents can emit structured UI components (charts, tables) alongside text.

The chat plugin needs a minor modification to include page_context in the AG-UI request payload:

{
    "thread_id": "thread_abc123",
    "run_id": "run_xyz789",
    "messages": [...],
    "context": [
        {
            "description": "Current dashboard page",
            "value": "search-relevance"
        }
    ]
}

Authentication Flow

User → OSD (authenticates) → Chat Plugin → Python Agent Service → MCP Server → OpenSearch
                                                     │
                                                     └→ All agents: credentials forwarded via MCP header-based auth
  • The user authenticates with OpenSearch Dashboards as usual.
  • The chat plugin forwards the user's credentials (or token) to the Python agent service.
  • The agent service passes credentials to the OpenSearch MCP Server via header-based authentication (OPENSEARCH_HEADER_AUTH=true), which the MCP server already supports. This means the MCP server forwards the user's Authorization header to OpenSearch for every operation.
  • All agents (General Assistant and specialized agents) share the same authenticated MCP connection, ensuring they have exactly the same permissions as the user.
  • LLM provider credentials (e.g., AWS Bedrock) are configured server-side in environment variables and are not user-facing.

Observability

All agent operations are instrumented with OpenTelemetry:

  • Distributed tracing — Each request creates a trace spanning the orchestrator, sub-agent, and tool calls.
  • Metrics — Token usage, latency, tool call success/failure rates, routing decisions.
  • Logging — Structured logs with request IDs for debugging.

This enables operators to monitor agent performance, debug issues, and optimize costs across all sub-agents from a single observability pipeline.

Data Flow

Sequence Diagram: Specialized Agent (e.g., ART on Search Relevance Page)

Image

Sequence Diagram: General Assistant Fallback (e.g., on Index Management Page)

Image

Key differences between the two flows:

  • Routing: Specialized agent matched by page_context vs. fallback when no match.
  • Tools: ART uses shared MCP tools plus domain-specific tools (UBI analytics, SRW experiments). General Assistant uses shared MCP tools only.
  • LLM model: ART may use a more capable model (e.g., Sonnet) for complex reasoning; General Assistant can use a lighter model (e.g., Haiku) for simple queries.

Proposed Repo Structure

opensearch-agent-service/
├── src/
│   ├── orchestrator/                   # Core orchestration framework
│   │   ├── router.py                   # Page-context routing logic
│   │   ├── agent_registry.py           # Sub-agent registration
│   │   ├── orchestrator_agent.py       # Strands orchestrator agent
│   │   └── general_assistant.py        # MCP-powered general fallback agent
│   │
│   ├── agents/                         # Specialized sub-agents
│   │   ├── art/                        # ART agent (Search Relevance)
│   │   │   ├── agent.py
│   │   │   ├── tools/
│   │   │   └── prompts/
│   │   └── _template/                  # Template for new agents
│   │       ├── agent.py
│   │       ├── tools/
│   │       └── README.md
│   │
│   ├── server/                         # AG-UI protocol server
│   │   ├── app.py                      # FastAPI application
│   │   ├── routes.py                   # AG-UI endpoints
│   │   ├── auth.py                     # Authentication middleware
│   │   └── config.py                   # Server configuration
│   │
│   └── common/                         # Shared utilities
│       ├── opensearch_client.py        # OpenSearch connection
│       ├── mcp_connection.py           # MCP server management
│       ├── observability.py            # OpenTelemetry setup
│       └── llm_config.py              # LLM provider configuration
│
├── tests/
│   ├── unit/
│   ├── integration/
│   └── fixtures/
│
├── docs/
│   ├── ARCHITECTURE.md
│   ├── CONTRIBUTING_AGENTS.md          # Guide for adding new sub-agents
│   └── SETUP.md
│
├── deploy/
│   ├── docker-compose.yml
│   ├── Dockerfile
│   └── .env.example
│
├── pyproject.toml
└── README.md

Adding a New Sub-Agent

A key design goal is making it straightforward for contributors to add new sub-agents. The process:

1. Create agent directory:

src/agents/my_new_agent/
├── agent.py      # Agent definition with system prompt and tools
├── tools/        # Domain-specific tools
└── prompts/      # System prompts

2. Define the agent using Strands SDK:

# src/agents/my_new_agent/agent.py
from strands import Agent, tool

@tool
def my_domain_tool(param: str) -> str:
    """Tool description for LLM."""
    # Implementation
    ...

def create_agent(opensearch_client, model) -> Agent:
    return Agent(
        model=model,
        system_prompt="You are an expert in ...",
        tools=[my_domain_tool],
    )

3. Register in the agent registry:

# src/orchestrator/agent_registry.py
AGENT_REGISTRY = {
    "search-relevance": art_agent,
    "my-new-page": my_new_agent,    # Add mapping
}

No changes to the orchestrator, server, or any other sub-agent are required.

What Changes in Existing Components

Component Change Required Impact
OpenSearch Core None No impact
ml-commons Agent Framework None Not required; General Assistant uses MCP tools directly
Search Relevance Plugin None ART uses existing APIs
UBI Plugin None ART reads existing indices
OSD Chat Plugin Minor: add page_context to request payload Low impact, backward-compatible
OpenSearch MCP Server None (reused as-is) Shared tool layer; header-based auth already supported

Security Considerations

  • No new attack surface on OpenSearch — The Python agent service communicates with OpenSearch through existing APIs with existing authentication. No new endpoints or plugins are added to OpenSearch.
  • User-scoped permissions — Agents operate with the authenticated user's credentials. A read-only user cannot perform write operations through any agent.
  • No dynamic code execution — Agents do not use eval(), exec(), or shell commands. All operations are predefined tool calls.
  • LLM credential isolation — LLM provider credentials (AWS Bedrock keys) are server-side only and never exposed to users or included in traces.
  • Input validation — All user inputs are validated before being passed to OpenSearch or LLM providers. Prompt injection mitigations are implemented via constrained system prompts and parameter validation.
  • Audit logging — All agent actions are traced via OpenTelemetry with user attribution.

Backward Compatibility

  • Fully backward-compatible — This is an additive capability. Existing OpenSearch agents, APIs, and workflows are unaffected.
  • Opt-in adoption — Dashboards can continue using the existing chat assistant without the Python agent service. The service is deployed alongside (not replacing) the existing infrastructure.
  • Graceful degradation — If the Python agent service is unavailable, the chat plugin can fall back to direct communication with the existing OpenSearch agent framework.
  • No data migration — The Python agent service is stateless with respect to OpenSearch. It creates no new indices and modifies no existing schemas.

Alternatives Considered

Alternative 1: Extend the Java Agent Framework

Add new agent capabilities directly in ml-commons Java code.

Pros: Single technology stack, no new service to deploy.
Cons: High development friction for AI/ML contributors, limited access to Python AI ecosystem, slower iteration cycles, does not attract new contributor groups.

Decision: Rejected. The Python AI ecosystem is significantly more mature for agent development, and maintaining a Java-only approach limits community growth.

Alternative 2: Wrap Java Agent as Fallback (Instead of MCP-Based General Assistant)

Keep the existing Java agent as the fallback by wrapping its REST API (/_plugins/_ml/agents/{id}/_execute) as a Strands tool.

Pros: Zero reimplementation of existing capabilities, preserves Java agent investment.
Cons: Double LLM invocation (orchestrator LLM + Java agent's own LLM), requires ml-commons to be installed, inconsistent architecture (one agent is a REST wrapper while others are native Strands agents), cannot leverage MCP server tools or customize the fallback's behavior.

Decision: Rejected in favor of MCP-based General Assistant. The OpenSearch MCP Server already provides all the tools needed for general queries (search, list indices, cluster health, mappings, and even a generic API tool). Building the fallback as a native Python agent is simpler, faster (single LLM call), and consistent with the rest of the architecture.

Alternative 3: Replace OpenSearch Agents Entirely with Python

Deprecate the Java agent framework and rebuild all capabilities in Python.

Pros: Single agent framework, no fallback complexity.
Cons: Massive migration effort, discards existing work, breaks current users, high risk.

Decision: Rejected. The MCP-based approach achieves equivalent general capabilities without requiring Java agent deprecation. Existing Java agent users are unaffected.

Alternative 4: MCP-Only Architecture (No Orchestrator)

Expose Python agents as individual MCP servers and let the Dashboard chat plugin call them directly.

Pros: Simpler architecture, no orchestrator overhead.
Cons: No intelligent routing, no cross-agent coordination, each agent needs its own endpoint, no fallback logic, frontend becomes responsible for agent selection.

Decision: Rejected. An orchestrator provides intelligent routing, unified endpoint, and the ability to compose agents for complex workflows.

Open Questions

  1. Chat plugin page_context: What is the most appropriate mechanism for the chat plugin to include page context? Options include: AG-UI context field, custom HTTP header, or URL parameter.
  2. Shared memory across sub-agents: Should sub-agents be able to share context within a session (e.g., ART agent's analysis results available to evaluation)? If so, what is the memory model?
  3. Agent discovery: Should the service expose an API for the frontend to discover available agents and their capabilities (e.g., to show page-specific UI hints)?
  4. Multi-tenancy: For managed deployments, should the service support tenant isolation beyond OpenSearch's existing RBAC?
  5. MCP server tool contributions: As specialized agents identify needs for new OpenSearch tools (e.g., plugin-specific APIs), should these be contributed back to opensearch-mcp-server-py as shared tools, or kept as agent-local tools?
  6. MCP server multi-cluster mode: Should the agent service leverage the MCP server's multi-cluster mode to allow a single agent service to operate across multiple OpenSearch clusters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or request

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions