This document provides a deep technical dive into the TraceMind-AI architecture, implementation details, and system design.
- System Overview
- Project Structure
- Core Components
- MCP Client Architecture
- Agent Framework Integration
- Data Flow
- Authentication & Authorization
- Screen Navigation
- Job Submission Architecture
- Deployment
- Performance Optimization
TraceMind-AI is a comprehensive Gradio-based web application for evaluating AI agent performance. It serves as the user-facing platform in the TraceMind ecosystem, demonstrating enterprise MCP client usage (Track 2: MCP in Action).
| Component | Technology | Version | Purpose |
|---|---|---|---|
| UI Framework | Gradio | 5.49.1 | Web interface with components |
| MCP Client | MCP Python SDK | Latest | Connect to MCP servers |
| Agent Framework | smolagents | 1.22.0+ | Autonomous agent with MCP tools |
| Data Source | HuggingFace Datasets | Latest | Load evaluation results |
| Authentication | HuggingFace OAuth | - | User authentication |
| Job Platforms | HF Jobs + Modal | - | Evaluation job submission |
| Language | Python | 3.10+ | Core implementation |
┌─────────────────────────────────────────────────────────────┐
│ User Browser │
│ - Gradio Interface (React-based) │
│ - OAuth Flow (HuggingFace) │
└──────────────┬──────────────────────────────────────────────┘
│
│ HTTP/WebSocket
↓
┌─────────────────────────────────────────────────────────────┐
│ TraceMind-AI (Gradio App) - Track 2 │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Screen Layer (screens/) │ │
│ │ - Leaderboard │ │
│ │ - Agent Chat │ │
│ │ - New Evaluation │ │
│ │ - Job Monitoring │ │
│ │ - Trace Detail │ │
│ │ - Settings │ │
│ └────────────┬────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┴────────────────────────────────────────┐ │
│ │ Component Layer (components/) │ │
│ │ - Leaderboard Table (Custom HTML) │ │
│ │ - Analytics Charts │ │
│ │ - Metric Displays │ │
│ │ - Report Cards │ │
│ └────────────┬────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┴────────────────────────────────────────┐ │
│ │ Service Layer │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ MCP Client │ │ Data Loader │ │ │
│ │ │ (mcp_client/) │ │ (data_loader.py) │ │ │
│ │ └──────────────────┘ └──────────────────┘ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Agent (smolagents│ │ Job Submission │ │ │
│ │ │ screens/chat.py) │ │ (utils/) │ │ │
│ │ └──────────────────┘ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└───────────┬───────────────────────────────────┬─────────────┘
│ │
↓ ↓
┌───────────────────────┐ ┌───────────────────────┐
│ TraceMind MCP Server │ │ External Services │
│ (Track 1) │ │ - HF Datasets │
│ - 11 AI Tools │ │ - HF Jobs │
│ - 3 Resources │ │ - Modal │
│ - 3 Prompts │ │ - LLM APIs │
└───────────────────────┘ └───────────────────────┘
TraceMind-AI/
├── app.py # Main entry point, Gradio app
│
├── screens/ # UI screens (6 tabs)
│ ├── __init__.py
│ ├── leaderboard.py # Screen 1: Leaderboard with AI insights
│ ├── chat.py # Screen 2: Agent Chat (smolagents)
│ ├── dashboard.py # Screen 3: New Evaluation
│ ├── job_monitoring.py # Screen 4: Job Status Tracking
│ ├── trace_detail.py # Screen 5: Trace Visualization
│ ├── settings.py # Screen 6: API Key Configuration
│ ├── compare.py # Screen 7: Run Comparison (optional)
│ ├── documentation.py # Screen 8: API Documentation
│ └── mcp_helpers.py # Shared MCP client helpers
│
├── components/ # Reusable UI components
│ ├── __init__.py
│ ├── leaderboard_table.py # Custom HTML table component
│ ├── analytics_charts.py # Performance charts (Plotly)
│ ├── metric_displays.py # Metric cards and badges
│ ├── report_cards.py # Summary report cards
│ └── thought_graph.py # Agent reasoning visualization
│
├── mcp_client/ # MCP client implementation
│ ├── __init__.py
│ ├── client.py # Async MCP client
│ └── sync_wrapper.py # Synchronous wrapper for Gradio
│
├── utils/ # Utility modules
│ ├── __init__.py
│ ├── auth.py # HuggingFace OAuth
│ ├── navigation.py # Screen navigation state
│ ├── hf_jobs_submission.py # HuggingFace Jobs integration
│ └── modal_job_submission.py # Modal integration
│
├── styles/ # Custom styling
│ ├── __init__.py
│ └── tracemind_theme.py # Gradio theme customization
│
├── data_loader.py # Dataset loading and caching
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── .gitignore
├── README.md # Project documentation
└── USER_GUIDE.md # Complete user guide
Total: ~35 files, ~8,000 lines of code
| Directory | Files | Lines | Purpose |
|---|---|---|---|
screens/ |
9 | ~3,500 | UI screen implementations |
components/ |
5 | ~1,200 | Reusable UI components |
mcp_client/ |
3 | ~800 | MCP client integration |
utils/ |
4 | ~1,500 | Authentication, jobs, navigation |
styles/ |
2 | ~300 | Custom theme and CSS |
| Root | 3 | ~700 | Main app, data loader, config |
Purpose: Entry point, orchestrates all screens and manages global state.
Architecture:
# app.py structure
import gradio as gr
from screens import *
from mcp_client.sync_wrapper import get_sync_mcp_client
from utils.auth import auth_ui
from data_loader import DataLoader
# 1. Initialize services
mcp_client = get_sync_mcp_client()
mcp_client.initialize()
data_loader = DataLoader()
# 2. Create Gradio app
with gr.Blocks(theme=tracemind_theme) as app:
# Global state
gr.State(...) # User session, navigation, etc.
# Authentication (if not disabled)
if not DISABLE_OAUTH:
auth_ui()
# Main tabs
with gr.Tabs():
with gr.Tab("📊 Leaderboard"):
leaderboard_screen()
with gr.Tab("🤖 Agent Chat"):
chat_screen()
with gr.Tab("🚀 New Evaluation"):
dashboard_screen()
with gr.Tab("📈 Job Monitoring"):
job_monitoring_screen()
with gr.Tab("⚙️ Settings"):
settings_screen()
# 3. Launch
if __name__ == "__main__":
app.launch(
server_name="0.0.0.0",
server_port=7860,
share=False
)Key Responsibilities:
- Initialize MCP client and data loader (global instances)
- Create tabbed interface with all screens
- Manage authentication flow
- Handle global state (user session, API keys)
Each screen is a self-contained module that returns a Gradio component tree.
Purpose: Display evaluation results with AI-powered insights.
Components:
- Load button
- AI insights panel (Markdown) - powered by MCP server
- Leaderboard table (custom HTML component)
- Filter controls (agent type, provider)
MCP Integration:
def load_leaderboard(mcp_client):
# 1. Load dataset
ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
df = pd.DataFrame(ds)
# 2. Get AI insights from MCP server
insights = mcp_client.analyze_leaderboard(
metric_focus="overall",
time_range="last_week",
top_n=5
)
# 3. Render table with custom component
table_html = render_leaderboard_table(df)
return insights, table_htmlPurpose: Autonomous agent interface with MCP tool access.
Agent Setup:
from smolagents import ToolCallingAgent, MCPClient, HfApiModel
# Initialize agent with MCP client
def create_agent():
mcp_client = MCPClient(MCP_SERVER_URL)
model = HfApiModel(
model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
token=os.getenv("HF_TOKEN")
)
agent = ToolCallingAgent(
tools=[], # MCP tools loaded automatically
model=model,
mcp_client=mcp_client,
max_steps=10
)
return agent
# Chat interaction
def agent_chat(message, history, show_reasoning):
if show_reasoning:
agent.verbosity_level = 2 # Show tool execution
else:
agent.verbosity_level = 0 # Only final answer
response = agent.run(message)
history.append((message, response))
return history, ""MCP Tool Access: Agent automatically discovers and uses all 11 MCP tools from TraceMind MCP Server.
Purpose: Submit evaluation jobs to HuggingFace Jobs or Modal.
Key Functions:
- Model selection (text input)
- Infrastructure choice (HF Jobs / Modal)
- Hardware selection (auto / manual)
- Cost estimation (MCP-powered)
- Job submission
Cost Estimation Flow:
def estimate_cost_click(model, agent_type, num_tests, hardware, mcp_client):
# Call MCP server for cost estimate
estimate = mcp_client.estimate_cost(
model=model,
agent_type=agent_type,
num_tests=num_tests,
hardware=hardware
)
return estimate # Display in dialogJob Submission Flow:
def submit_job(model, agent_type, hardware, infrastructure, api_keys):
if infrastructure == "HuggingFace Jobs":
job_id = submit_hf_job(model, agent_type, hardware, api_keys)
elif infrastructure == "Modal":
job_id = submit_modal_job(model, agent_type, hardware, api_keys)
return f"✅ Job submitted: {job_id}"Purpose: Track status of submitted jobs.
Data Source: HuggingFace Jobs API or Modal API
Refresh Strategy:
- Manual refresh button
- Auto-refresh every 30 seconds (optional)
Purpose: Visualize OpenTelemetry traces with GPU metrics.
Components:
- Waterfall diagram (spans timeline)
- Span details panel
- GPU metrics overlay (for GPU jobs)
- MCP-powered Q&A
Trace Loading:
def load_trace(trace_id, traces_repo):
# Load trace dataset
ds = load_dataset(traces_repo)
trace_data = ds.filter(lambda x: x["trace_id"] == trace_id)[0]
# Render waterfall
waterfall_html = render_waterfall(trace_data["spans"])
return waterfall_htmlMCP Q&A:
def ask_trace_question(trace_id, traces_repo, question, mcp_client):
# Call MCP server to debug trace
answer = mcp_client.debug_trace(
trace_id=trace_id,
traces_repo=traces_repo,
question=question
)
return answerPurpose: Configure API keys and preferences.
Security:
- Keys stored in Gradio State (session-only, not server-side)
- All forms use
api_name=False(not exposed via API) - HTTPS encryption for all API calls
Configuration Options:
- Gemini API Key
- HuggingFace Token
- Modal Token ID + Secret
- LLM Provider Keys (OpenAI, Anthropic, etc.)
Reusable UI components that can be used across multiple screens.
Purpose: Custom HTML table with sorting, filtering, and styling.
Why Custom Component?:
- Gradio's default Dataframe component lacks advanced styling
- Need clickable rows for navigation
- Custom sorting and filtering logic
- Badge rendering for metrics
Implementation:
def render_leaderboard_table(df: pd.DataFrame) -> str:
"""Render leaderboard as interactive HTML table"""
html = """
<style>
.leaderboard-table { ... }
.metric-badge { ... }
</style>
<table class="leaderboard-table">
<thead>
<tr>
<th onclick="sortTable(0)">Model</th>
<th onclick="sortTable(1)">Success Rate</th>
<th onclick="sortTable(2)">Cost</th>
...
</tr>
</thead>
<tbody>
"""
for idx, row in df.iterrows():
html += f"""
<tr onclick="selectRun('{row['run_id']}')">
<td>{row['model']}</td>
<td><span class="badge success">{row['success_rate']}%</span></td>
<td>${row['total_cost_usd']:.4f}</td>
...
</tr>
"""
html += """
</tbody>
</table>
<script>
function sortTable(col) { ... }
function selectRun(runId) {
// Trigger Gradio event to navigate to run detail
document.dispatchEvent(new CustomEvent('runSelected', {detail: runId}));
}
</script>
"""
return htmlIntegration with Gradio:
# In leaderboard screen
table_html = gr.HTML()
load_btn.click(
fn=lambda: render_leaderboard_table(df),
outputs=table_html
)Purpose: Performance charts using Plotly.
Charts Provided:
- Success rate over time (line chart)
- Cost comparison (bar chart)
- Duration distribution (histogram)
- CO2 emissions by model (pie chart)
Example:
import plotly.graph_objects as go
def create_cost_comparison_chart(df):
fig = go.Figure(data=[
go.Bar(
x=df['model'],
y=df['total_cost_usd'],
marker_color='indianred'
)
])
fig.update_layout(
title="Cost Comparison by Model",
xaxis_title="Model",
yaxis_title="Total Cost (USD)"
)
return figPurpose: Visualize agent reasoning steps (for Agent Chat).
Visualization:
- Graph nodes: Reasoning steps, tool calls
- Edges: Flow between steps
- Annotations: Tool results, errors
Purpose: Connect to TraceMind MCP Server via MCP protocol.
Implementation: (See MCP_INTEGRATION.md for full code)
Key Methods:
connect(): Establish SSE connection to MCP servercall_tool(tool_name, arguments): Call an MCP toolanalyze_leaderboard(**kwargs): Wrapper for analyze_leaderboard toolestimate_cost(**kwargs): Wrapper for estimate_cost tooldebug_trace(**kwargs): Wrapper for debug_trace tool
Purpose: Provide synchronous API for Gradio event handlers.
Why Needed?: Gradio event handlers are synchronous, but MCP client is async.
Pattern:
class SyncMCPClient:
def __init__(self, mcp_server_url):
self.async_client = AsyncMCPClient(mcp_server_url)
def _run_async(self, coro):
"""Run async coroutine in sync context"""
loop = asyncio.get_event_loop()
return loop.run_until_complete(coro)
def analyze_leaderboard(self, **kwargs):
"""Synchronous wrapper"""
return self._run_async(self.async_client.analyze_leaderboard(**kwargs))Purpose: Load and cache HuggingFace datasets.
Features:
- In-memory caching (5-minute TTL)
- Error handling for missing datasets
- Automatic retry logic
- Dataset validation
Implementation:
from datasets import load_dataset
from functools import lru_cache
import time
class DataLoader:
def __init__(self):
self.cache = {}
self.cache_ttl = 300 # 5 minutes
def load_leaderboard(self, repo="kshitijthakkar/smoltrace-leaderboard"):
"""Load leaderboard with caching"""
cache_key = f"leaderboard:{repo}"
# Check cache
if cache_key in self.cache:
cached_time, cached_data = self.cache[cache_key]
if time.time() - cached_time < self.cache_ttl:
return cached_data
# Load fresh data
ds = load_dataset(repo, split="train")
df = pd.DataFrame(ds)
# Cache
self.cache[cache_key] = (time.time(), df)
return df
def load_results(self, repo):
"""Load results dataset for specific run"""
ds = load_dataset(repo, split="train")
return pd.DataFrame(ds)
def load_traces(self, repo):
"""Load traces dataset for specific run"""
ds = load_dataset(repo, split="train")
return ds # Keep as Dataset for filteringFull details in: MCP_INTEGRATION.md
Summary:
- Async Client:
mcp_client/client.py- async MCP protocol implementation - Sync Wrapper:
mcp_client/sync_wrapper.py- synchronous API for Gradio - Global Instance: Initialized once in
app.py, shared across all screens
Usage Pattern:
# In app.py (initialization)
from mcp_client.sync_wrapper import get_sync_mcp_client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()
# In screen (usage)
def some_event_handler(mcp_client):
result = mcp_client.analyze_leaderboard(metric_focus="cost")
return resultFull details in: MCP_INTEGRATION.md
Framework: smolagents (HuggingFace's agent framework)
Key Features:
- Autonomous tool discovery from MCP server
- Multi-step reasoning with tool chaining
- Context-aware responses
- Reasoning visualization (optional)
Agent Setup:
from smolagents import ToolCallingAgent, MCPClient
agent = ToolCallingAgent(
tools=[], # Empty - tools loaded from MCP server
model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
mcp_client=MCPClient(MCP_SERVER_URL),
max_steps=10
)1. User clicks "Load Leaderboard"
↓
2. Gradio Event Handler (leaderboard.py)
load_leaderboard()
↓
3. Data Loader (data_loader.py)
├─→ Check cache (5-min TTL)
│ └─→ If cached: return cached data
└─→ If not cached: load from HF Datasets
└─→ load_dataset("kshitijthakkar/smoltrace-leaderboard")
↓
4. MCP Client (sync_wrapper.py)
mcp_client.analyze_leaderboard(metric_focus="overall")
↓
5. MCP Server (TraceMind-mcp-server)
├─→ Load data
├─→ Call Gemini API
└─→ Return AI analysis
↓
6. Render Components
├─→ AI Insights (Markdown)
└─→ Leaderboard Table (Custom HTML)
↓
7. Display to User
1. User types message: "What are the top 3 models?"
↓
2. Gradio Event Handler (chat.py)
agent_chat(message, history, show_reasoning)
↓
3. smolagents Agent
agent.run(message)
├─→ Step 1: Plan approach
│ └─→ "Need to get top models from leaderboard"
├─→ Step 2: Discover MCP tools
│ └─→ Found: get_top_performers, analyze_leaderboard
├─→ Step 3: Call MCP tool
│ └─→ get_top_performers(metric="success_rate", top_n=3)
├─→ Step 4: Parse result
│ └─→ Extract model names, success rates, costs
└─→ Step 5: Format response
└─→ Generate markdown table with insights
↓
4. Return to user with full reasoning trace (if enabled)
1. User fills form → Clicks "Submit Evaluation"
↓
2. Gradio Event Handler (dashboard.py)
submit_job(model, agent_type, hardware, infrastructure)
↓
3. Job Submission Module (utils/)
if infrastructure == "HuggingFace Jobs":
├─→ hf_jobs_submission.py
├─→ Build job config (YAML)
├─→ Submit via HF Jobs API
└─→ Return job_id
elif infrastructure == "Modal":
├─→ modal_job_submission.py
├─→ Build Modal app config
├─→ Submit via Modal SDK
└─→ Return job_id
↓
4. Store job_id in session state
↓
5. Redirect to Job Monitoring screen
↓
6. Auto-refresh status every 30s
Implementation: utils/auth.py
Flow:
1. User visits TraceMind-AI
↓
2. Check OAuth token in session
├─→ If valid: proceed to app
└─→ If invalid: show login screen
↓
3. User clicks "Sign in with HuggingFace"
↓
4. Redirect to HuggingFace OAuth page
├─→ User authorizes TraceMind-AI
└─→ HF redirects back with token
↓
5. Store token in Gradio State (session)
↓
6. Use token for:
├─→ HF Datasets access
├─→ HF Jobs submission
└─→ User identification
Code:
# utils/auth.py
import gradio as gr
def auth_ui():
"""Create OAuth login UI"""
gr.LoginButton(
value="Sign in with HuggingFace",
auth_provider="huggingface"
)
# In app.py
with gr.Blocks() as app:
if not DISABLE_OAUTH:
auth_ui()Strategy: Session-only storage (not server-side persistence)
Implementation:
# In settings screen
def save_api_keys(gemini_key, hf_token):
"""Store keys in session state"""
session_state = gr.State({
"gemini_key": gemini_key,
"hf_token": hf_token
})
# Override default clients with user keys
if gemini_key:
os.environ["GEMINI_API_KEY"] = gemini_key
if hf_token:
os.environ["HF_TOKEN"] = hf_token
return "✅ API keys saved for this session"Security:
- ✅ Keys stored only in browser memory
- ✅ Not saved to disk or database
- ✅ Forms use
api_name=False(not exposed via API) - ✅ HTTPS encryption
Pattern: Gradio State components for session data
# In app.py
with gr.Blocks() as app:
# Global state
session_state = gr.State({
"user": None,
"current_run_id": None,
"current_trace_id": None,
"api_keys": {}
})
# Pass to all screens
leaderboard_screen(session_state)
chat_screen(session_state)Pattern: Click event triggers tab switch + state update
# In leaderboard screen
def row_click(run_id, session_state):
"""Navigate to run detail when row clicked"""
session_state["current_run_id"] = run_id
# Switch to trace detail tab (Tab index 4)
return gr.Tabs.update(selected=4), session_state
table_component.select(
fn=row_click,
inputs=[gr.State(), session_state],
outputs=[main_tabs, session_state]
)File: utils/hf_jobs_submission.py
Key Functions:
def submit_hf_job(model, agent_type, hardware, api_keys):
"""Submit evaluation job to HuggingFace Jobs"""
# 1. Build job config (YAML)
job_config = {
"name": f"SMOLTRACE Eval - {model}",
"hardware": hardware, # cpu-basic, t4-small, a10g-small, a100-large, h200
"environment": {
"MODEL": model,
"AGENT_TYPE": agent_type,
"HF_TOKEN": api_keys["hf_token"],
# ... other env vars
},
"command": [
"pip install smoltrace[otel,gpu]",
f"smoltrace-eval --model {model} --agent-type {agent_type} ..."
]
}
# 2. Submit via HF Jobs API
response = requests.post(
"https://huggingface.co/api/jobs",
headers={"Authorization": f"Bearer {api_keys['hf_token']}"},
json=job_config
)
# 3. Return job ID
job_id = response.json()["id"]
return job_idFile: utils/modal_job_submission.py
Key Functions:
import modal
def submit_modal_job(model, agent_type, hardware, api_keys):
"""Submit evaluation job to Modal"""
# 1. Create Modal app
app = modal.App("smoltrace-eval")
# 2. Define function with GPU
@app.function(
image=modal.Image.debian_slim().pip_install("smoltrace[otel,gpu]"),
gpu=hardware, # A10, A100-80GB, H200
secrets=[
modal.Secret.from_dict({
"HF_TOKEN": api_keys["hf_token"],
# ... other secrets
})
]
)
def run_evaluation():
import smoltrace
# Run evaluation
results = smoltrace.evaluate(model=model, agent_type=agent_type)
return results
# 3. Deploy and run
with app.run():
result = run_evaluation.remote()
return result.job_idPlatform: HuggingFace Spaces SDK: Gradio 5.49.1 Hardware: CPU Basic (upgradeable) URL: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
Space Metadata (README.md header):
---
title: TraceMind AI
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
- mcp-in-action-track-enterprise
- agent-evaluation
- mcp-client
- leaderboard
- gradio
---Set in HF Spaces Secrets:
# Required
GEMINI_API_KEY=your_gemini_key
HF_TOKEN=your_hf_token
# Optional
MCP_SERVER_URL=https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard
DISABLE_OAUTH=false # Set to true for local developmentImplementation: data_loader.py
- In-memory cache with 5-minute TTL
- Reduces HF Datasets API calls
- Faster page loads
Pattern: Use async for non-blocking I/O
# Could be optimized to run in parallel
async def load_data_with_insights():
leaderboard_task = load_dataset_async(...)
insights_task = mcp_client.analyze_leaderboard_async(...)
leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)
return leaderboard, insightsStrategy: Load components only when tabs are activated
with gr.Tab("Trace Detail", visible=False) as trace_tab:
# Components created only when tab first shown
@trace_tab.select
def load_trace_components():
return build_trace_visualization()- README.md - Overview and quick start
- USER_GUIDE.md - Complete screen-by-screen guide
- MCP_INTEGRATION.md - MCP client implementation
- TraceMind MCP Server - Server-side architecture
Last Updated: November 21, 2025 Version: 1.0.0 Track: MCP in Action (Enterprise)