-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Overview
This RFC proposes a Python-based agent service that serves as a unified backend for AI-powered assistants in OpenSearch Dashboards. The service uses Strands SDK as the orchestration framework and follows the multi-agent pattern, where a top-level orchestrator routes user requests to specialized sub-agents based on the Dashboard page context.
All sub-agents share a common tool layer built on the OpenSearch MCP Server, which already provides comprehensive OpenSearch operations (index listing, search, mappings, cluster health, shards, explain, multi-search, and a generic API tool). A General Assistant sub-agent (OpenSearch Agent) powered by these MCP tools serves as the default fallback for pages without a dedicated specialist, while domain-specific sub-agents (e.g., Agentic Relevance Tuning for the Search Relevance page) extend the shared MCP tools with additional domain-specific tools. This enables incremental adoption — pages without a dedicated sub-agent get a capable general assistant, while specialized pages benefit from purpose-built agents.
Problem Statement
1. Development friction with Java-based agents
The current OpenSearch Agent Framework requires agent development in Java with REST API integration. While this works for core OpenSearch contributors, it creates significant friction for:
- AI/ML engineers and data scientists who predominantly work in Python and are the primary audience for building intelligent agents.
- Rapid prototyping — Java's compile-deploy cycle is slow compared to Python's dynamic development workflow.
- LLM ecosystem access — The most mature agent frameworks (Strands, LangChain, CrewAI), tool libraries, and LLM SDKs are Python-first.
- Completing resources -- The current OpenSearch agent framework are running within a open search cluster and the assistant agent will compete resources with the main opensearch requests.
2. Monolithic agent limitations
The current architecture couples all agent capabilities into a single framework. This makes it difficult to:
- Use different LLM models per task (e.g., a fast/cheap model for simple analytics, a powerful model for complex reasoning).
- Isolate failures — a bug in one agent capability affects all others.
- Scale development — multiple teams cannot independently develop and deploy agents for different domains.
3. Underutilized Dashboard page context
OpenSearch Dashboards has distinct functional pages (Index Management, Search Relevance, Anomaly Detection, Observability, etc.), each with domain-specific user intents. The current chat assistant does not leverage this page context to provide specialized help. A user on the Search Relevance page asking "help me improve results for 'laptop'" should get a fundamentally different experience than a user on the Index Management page asking "help me manage my indices."
4. Community growth opportunity
The OpenSearch project aims to attract new contributor groups. A Python agent service lowers the barrier for data scientists, AI engineers, and the broader Python community to contribute agent capabilities to OpenSearch without needing Java expertise or modifications to OpenSearch core.
Motivation
This RFC generalizes the ART architecture into a reusable framework that:
- Allows any Dashboard page to have a dedicated AI assistant.
- Reuses the OpenSearch MCP Server as a shared tool layer across all agents.
- Provides a capable General Assistant as the default fallback, powered by MCP tools.
- Provides a standardized pattern for contributing new Python agents.
- Leverages a single deployment (one service, one endpoint) for all agents.
Goals
- Create a Python agent service that integrates with OpenSearch Dashboards chat plugin via the AG-UI protocol.
- Implement a Strands-based orchestrator that routes requests to the appropriate sub-agent based on page context.
- Reuse the OpenSearch MCP Server as the shared tool layer for all agents to interact with OpenSearch.
- Provide a General Assistant sub-agent (powered by MCP tools) as the default fallback for pages without a dedicated specialist.
- Establish ART as the first specialized sub-agent, serving the Search Relevance Workbench page.
- Define a clear pattern and contribution guide for adding new sub-agents.
- Support multiple LLM providers (AWS Bedrock, OpenAI, Ollama) through Strands SDK's model abstraction.
Proposed Solution
Design Overview
Routing Mechanism
The orchestrator uses a two-tier routing strategy: deterministic page-based routing first, LLM-based routing as supplement.
Tier 1 — Deterministic routing (page context):
The Dashboard chat plugin injects the current page context into each request. The orchestrator uses a simple registry to map page identifiers to sub-agents:
# Agent registry: page_context → sub-agent
AGENT_REGISTRY = {
"search-relevance": art_agent, # ART handles Search Relevance pages
# "anomaly-detection": anomaly_agent, # Future
# "index-management": index_mgmt_agent, # Future
}
def route_request(page_context: str, message: str) -> Agent:
"""Route to specialized agent or fallback to General Assistant."""
if page_context in AGENT_REGISTRY:
return AGENT_REGISTRY[page_context]
return general_assistant # Fallback: MCP-powered general agentTier 2 — LLM-based routing (supplementary):
When page context alone is insufficient (e.g., a cross-domain question asked from any page), the orchestrator LLM can analyze the user's intent and delegate to the appropriate sub-agent. The orchestrator uses a lightweight model (e.g., Claude Haiku) for routing decisions to minimize latency and cost.
OpenSearch MCP Server as Shared Tool Layer
The OpenSearch MCP Server is a key building block of this design. It already provides a comprehensive set of OpenSearch tools via the Model Context Protocol:
| Category | Tools | Description |
|---|---|---|
| Core | ListIndexTool, SearchIndexTool, IndexMappingTool, ClusterHealthTool, CountTool, ExplainTool, MsearchTool, GetShardsTool | Essential OpenSearch operations |
| Generic | GenericOpenSearchApiTool | Flexible tool that can call any OpenSearch API endpoint |
| Skills | DataDistributionTool, LogPatternAnalysisTool | Higher-level analytical capabilities |
| Advanced | GetClusterStateTool, CatNodesTool, GetNodesTool, GetIndexStatsTool, GetQueryInsightsTool, etc. | Deep cluster inspection |
The MCP server supports stdio and streaming (SSE/HTTP) transports, header-based auth (for credential pass-through from Dashboard), multi-cluster mode, and configurable tool filtering.
All sub-agents share the same MCP connection to OpenSearch. This means:
- No duplicated OpenSearch client code across agents.
- New tools added to the MCP server are immediately available to all agents.
- Authentication and connection management are handled in one place.
General Assistant as Default Fallback
Instead of wrapping the existing Java agent via REST API, the fallback is a native Python agent powered directly by the OpenSearch MCP Server tools:
from strands import Agent
# MCP tools are loaded once at startup and shared across all agents
# opensearch_mcp_tools = [ListIndexTool, SearchIndexTool, ClusterHealthTool, ...]
general_assistant = Agent(
model=haiku_model, # Lightweight model for general queries
system_prompt="""You are a general-purpose OpenSearch assistant.
You help users explore their cluster, understand indices, query data,
check cluster health, and answer questions about their OpenSearch
environment. Use the available tools to interact with OpenSearch.""",
tools=opensearch_mcp_tools, # All MCP tools available
)This approach has significant advantages over wrapping the Java agent:
- No double LLM invocation — the Java agent would invoke its own LLM; the Python General Assistant uses the MCP tools directly, requiring only one LLM call.
- No dependency on ml-commons agent framework — works with any OpenSearch cluster, even without ml-commons installed.
- Consistent architecture — the fallback is a regular Strands agent like all other sub-agents, not a special wrapper.
- Same MCP tools as specialized agents — the General Assistant and specialized agents share the same tool layer; the difference is their system prompts and any additional domain-specific tools.
- Extensible — as the MCP server gains new tools (e.g.,
GetQueryInsightsTool,LogPatternAnalysisTool), the General Assistant automatically benefits.
How Specialized Agents Extend the Shared Tools
Specialized sub-agents receive the shared MCP tools plus their own domain-specific tools:
# ART agent = shared MCP tools + domain-specific SRW tools
art_agent = Agent(
model=sonnet_model,
system_prompt=ART_SYSTEM_PROMPT,
tools=[
*opensearch_mcp_tools, # Shared: ListIndex, Search, etc.
*art_specific_tools, # Domain: SRW configs, judgments,
# experiments, UBI analytics
],
)
# General Assistant = shared MCP tools only
general_assistant = Agent(
model=haiku_model,
system_prompt=GENERAL_ASSISTANT_PROMPT,
tools=opensearch_mcp_tools, # Shared tools only
)This layered tool approach means:
- Adding a new sub-agent only requires defining a system prompt and any domain-specific tools.
- Shared tools are never duplicated — they come from the single MCP server connection.
- Tool filtering can be configured per agent via the MCP server's
OPENSEARCH_TOOL_FILTERor YAML config.
ART as First Specialized Sub-Agent
The Agentic Relevance Tuning (ART) system serves as the reference implementation for the first specialized sub-agent. ART is activated when the user is on the Search Relevance Workbench page and provides:
- User Behavior Analysis — Analyzes UBI data for CTR, engagement patterns, zero-click rates.
- Hypothesis Generation — Diagnoses search issues and generates improvement hypotheses with automated sanity checks via pairwise experiments.
- Offline Evaluation — Runs pointwise experiments with judgment lists and calculates standard metrics (NDCG@K, MAP, Precision@K).
- Online Testing (planned) — Interleaved A/B testing for production validation.
ART already implements the "Agents as Tools" pattern with Strands SDK, connects to OpenSearch via MCP server, and communicates with the Dashboard via AG-UI protocol. The work required is to extract the orchestration and routing layer into the shared framework that other sub-agents can plug into.
AG-UI Protocol Integration
The service communicates with the OpenSearch Dashboards chat plugin via the AG-UI protocol, which provides:
- Streaming responses — Real-time token-by-token output via Server-Sent Events (SSE).
- Tool call visualization — Frontend renders tool calls and results as the agent works.
- State management — Thread-based conversation persistence.
- Generative UI — Agents can emit structured UI components (charts, tables) alongside text.
The chat plugin needs a minor modification to include page_context in the AG-UI request payload:
{
"thread_id": "thread_abc123",
"run_id": "run_xyz789",
"messages": [...],
"context": [
{
"description": "Current dashboard page",
"value": "search-relevance"
}
]
}Authentication Flow
User → OSD (authenticates) → Chat Plugin → Python Agent Service → MCP Server → OpenSearch
│
└→ All agents: credentials forwarded via MCP header-based auth
- The user authenticates with OpenSearch Dashboards as usual.
- The chat plugin forwards the user's credentials (or token) to the Python agent service.
- The agent service passes credentials to the OpenSearch MCP Server via header-based authentication (
OPENSEARCH_HEADER_AUTH=true), which the MCP server already supports. This means the MCP server forwards the user'sAuthorizationheader to OpenSearch for every operation. - All agents (General Assistant and specialized agents) share the same authenticated MCP connection, ensuring they have exactly the same permissions as the user.
- LLM provider credentials (e.g., AWS Bedrock) are configured server-side in environment variables and are not user-facing.
Observability
All agent operations are instrumented with OpenTelemetry:
- Distributed tracing — Each request creates a trace spanning the orchestrator, sub-agent, and tool calls.
- Metrics — Token usage, latency, tool call success/failure rates, routing decisions.
- Logging — Structured logs with request IDs for debugging.
This enables operators to monitor agent performance, debug issues, and optimize costs across all sub-agents from a single observability pipeline.
Data Flow
Sequence Diagram: Specialized Agent (e.g., ART on Search Relevance Page)
Sequence Diagram: General Assistant Fallback (e.g., on Index Management Page)
Key differences between the two flows:
- Routing: Specialized agent matched by
page_contextvs. fallback when no match. - Tools: ART uses shared MCP tools plus domain-specific tools (UBI analytics, SRW experiments). General Assistant uses shared MCP tools only.
- LLM model: ART may use a more capable model (e.g., Sonnet) for complex reasoning; General Assistant can use a lighter model (e.g., Haiku) for simple queries.
Proposed Repo Structure
opensearch-agent-service/
├── src/
│ ├── orchestrator/ # Core orchestration framework
│ │ ├── router.py # Page-context routing logic
│ │ ├── agent_registry.py # Sub-agent registration
│ │ ├── orchestrator_agent.py # Strands orchestrator agent
│ │ └── general_assistant.py # MCP-powered general fallback agent
│ │
│ ├── agents/ # Specialized sub-agents
│ │ ├── art/ # ART agent (Search Relevance)
│ │ │ ├── agent.py
│ │ │ ├── tools/
│ │ │ └── prompts/
│ │ └── _template/ # Template for new agents
│ │ ├── agent.py
│ │ ├── tools/
│ │ └── README.md
│ │
│ ├── server/ # AG-UI protocol server
│ │ ├── app.py # FastAPI application
│ │ ├── routes.py # AG-UI endpoints
│ │ ├── auth.py # Authentication middleware
│ │ └── config.py # Server configuration
│ │
│ └── common/ # Shared utilities
│ ├── opensearch_client.py # OpenSearch connection
│ ├── mcp_connection.py # MCP server management
│ ├── observability.py # OpenTelemetry setup
│ └── llm_config.py # LLM provider configuration
│
├── tests/
│ ├── unit/
│ ├── integration/
│ └── fixtures/
│
├── docs/
│ ├── ARCHITECTURE.md
│ ├── CONTRIBUTING_AGENTS.md # Guide for adding new sub-agents
│ └── SETUP.md
│
├── deploy/
│ ├── docker-compose.yml
│ ├── Dockerfile
│ └── .env.example
│
├── pyproject.toml
└── README.md
Adding a New Sub-Agent
A key design goal is making it straightforward for contributors to add new sub-agents. The process:
1. Create agent directory:
src/agents/my_new_agent/
├── agent.py # Agent definition with system prompt and tools
├── tools/ # Domain-specific tools
└── prompts/ # System prompts
2. Define the agent using Strands SDK:
# src/agents/my_new_agent/agent.py
from strands import Agent, tool
@tool
def my_domain_tool(param: str) -> str:
"""Tool description for LLM."""
# Implementation
...
def create_agent(opensearch_client, model) -> Agent:
return Agent(
model=model,
system_prompt="You are an expert in ...",
tools=[my_domain_tool],
)3. Register in the agent registry:
# src/orchestrator/agent_registry.py
AGENT_REGISTRY = {
"search-relevance": art_agent,
"my-new-page": my_new_agent, # Add mapping
}No changes to the orchestrator, server, or any other sub-agent are required.
What Changes in Existing Components
| Component | Change Required | Impact |
|---|---|---|
| OpenSearch Core | None | No impact |
| ml-commons Agent Framework | None | Not required; General Assistant uses MCP tools directly |
| Search Relevance Plugin | None | ART uses existing APIs |
| UBI Plugin | None | ART reads existing indices |
| OSD Chat Plugin | Minor: add page_context to request payload | Low impact, backward-compatible |
| OpenSearch MCP Server | None (reused as-is) | Shared tool layer; header-based auth already supported |
Security Considerations
- No new attack surface on OpenSearch — The Python agent service communicates with OpenSearch through existing APIs with existing authentication. No new endpoints or plugins are added to OpenSearch.
- User-scoped permissions — Agents operate with the authenticated user's credentials. A read-only user cannot perform write operations through any agent.
- No dynamic code execution — Agents do not use
eval(),exec(), or shell commands. All operations are predefined tool calls. - LLM credential isolation — LLM provider credentials (AWS Bedrock keys) are server-side only and never exposed to users or included in traces.
- Input validation — All user inputs are validated before being passed to OpenSearch or LLM providers. Prompt injection mitigations are implemented via constrained system prompts and parameter validation.
- Audit logging — All agent actions are traced via OpenTelemetry with user attribution.
Backward Compatibility
- Fully backward-compatible — This is an additive capability. Existing OpenSearch agents, APIs, and workflows are unaffected.
- Opt-in adoption — Dashboards can continue using the existing chat assistant without the Python agent service. The service is deployed alongside (not replacing) the existing infrastructure.
- Graceful degradation — If the Python agent service is unavailable, the chat plugin can fall back to direct communication with the existing OpenSearch agent framework.
- No data migration — The Python agent service is stateless with respect to OpenSearch. It creates no new indices and modifies no existing schemas.
Alternatives Considered
Alternative 1: Extend the Java Agent Framework
Add new agent capabilities directly in ml-commons Java code.
Pros: Single technology stack, no new service to deploy.
Cons: High development friction for AI/ML contributors, limited access to Python AI ecosystem, slower iteration cycles, does not attract new contributor groups.
Decision: Rejected. The Python AI ecosystem is significantly more mature for agent development, and maintaining a Java-only approach limits community growth.
Alternative 2: Wrap Java Agent as Fallback (Instead of MCP-Based General Assistant)
Keep the existing Java agent as the fallback by wrapping its REST API (/_plugins/_ml/agents/{id}/_execute) as a Strands tool.
Pros: Zero reimplementation of existing capabilities, preserves Java agent investment.
Cons: Double LLM invocation (orchestrator LLM + Java agent's own LLM), requires ml-commons to be installed, inconsistent architecture (one agent is a REST wrapper while others are native Strands agents), cannot leverage MCP server tools or customize the fallback's behavior.
Decision: Rejected in favor of MCP-based General Assistant. The OpenSearch MCP Server already provides all the tools needed for general queries (search, list indices, cluster health, mappings, and even a generic API tool). Building the fallback as a native Python agent is simpler, faster (single LLM call), and consistent with the rest of the architecture.
Alternative 3: Replace OpenSearch Agents Entirely with Python
Deprecate the Java agent framework and rebuild all capabilities in Python.
Pros: Single agent framework, no fallback complexity.
Cons: Massive migration effort, discards existing work, breaks current users, high risk.
Decision: Rejected. The MCP-based approach achieves equivalent general capabilities without requiring Java agent deprecation. Existing Java agent users are unaffected.
Alternative 4: MCP-Only Architecture (No Orchestrator)
Expose Python agents as individual MCP servers and let the Dashboard chat plugin call them directly.
Pros: Simpler architecture, no orchestrator overhead.
Cons: No intelligent routing, no cross-agent coordination, each agent needs its own endpoint, no fallback logic, frontend becomes responsible for agent selection.
Decision: Rejected. An orchestrator provides intelligent routing, unified endpoint, and the ability to compose agents for complex workflows.
Open Questions
- Chat plugin page_context: What is the most appropriate mechanism for the chat plugin to include page context? Options include: AG-UI
contextfield, custom HTTP header, or URL parameter. - Shared memory across sub-agents: Should sub-agents be able to share context within a session (e.g., ART agent's analysis results available to evaluation)? If so, what is the memory model?
- Agent discovery: Should the service expose an API for the frontend to discover available agents and their capabilities (e.g., to show page-specific UI hints)?
- Multi-tenancy: For managed deployments, should the service support tenant isolation beyond OpenSearch's existing RBAC?
- MCP server tool contributions: As specialized agents identify needs for new OpenSearch tools (e.g., plugin-specific APIs), should these be contributed back to
opensearch-mcp-server-pyas shared tools, or kept as agent-local tools? - MCP server multi-cluster mode: Should the agent service leverage the MCP server's multi-cluster mode to allow a single agent service to operate across multiple OpenSearch clusters?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status