Skip to content

fix: extract A2A_INTERACTION content in categorical evaluation transcripts#41

Closed
evekhm wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
evekhm:fix/a2a-transcript-extraction
Closed

fix: extract A2A_INTERACTION content in categorical evaluation transcripts#41
evekhm wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
evekhm:fix/a2a-transcript-extraction

Conversation

@evekhm
Copy link
Copy Markdown
Contributor

@evekhm evekhm commented Apr 15, 2026

Summary

  • Adds JSON_VALUE(content, '$.artifacts[0].parts[0].text') to the COALESCE chain in all 4 transcript-building SQL queries in categorical_evaluator.py
  • A2A_INTERACTION events (from ADK's RemoteA2aAgent via TransferToAgentTool) store the remote agent's response under artifacts[0].parts[0].text, which was not being extracted
  • Without this fix, AI.GENERATE sees a question with no answer and returns NULL for ~40% of sessions that use A2A agents

Details

The transcript-building SQL uses COALESCE to extract text from event content:

COALESCE(
  JSON_VALUE(content, '$.text_summary'),
  JSON_VALUE(content, '$.response'),
  JSON_VALUE(content, '$.tool'),
  ''
)

However, A2A_INTERACTION events (introduced in google/adk-python#5325) store the remote agent's response under a different JSON structure:

{
  "artifacts": [{"parts": [{"kind": "text", "text": "The actual agent response..."}]}],
  "contextId": "...",
  "history": [...]
}

None of the existing JSON paths match. The fix adds JSON_VALUE(content, '$.artifacts[0].parts[0].text') before the $.tool fallback in all 4 locations:

  1. CATEGORICAL_TRANSCRIPT_QUERY (line 233)
  2. CATEGORICAL_AI_GENERATE_QUERY (line 259)
  3. build_ai_classify_query() (line 380)
  4. build_ai_generate_query() (line 449)

Fixes #40

Test plan

  • Run evaluate_categorical() on a dataset containing A2A sessions (using RemoteA2aAgent)
  • Verify A2A session transcripts now include the remote agent's response text
  • Confirm parse error rate drops from ~40% to near 0% for A2A sessions
  • Verify non-A2A sessions are unaffected (COALESCE falls through as before)

🤖 Generated with Claude Code

…ripts

Add JSON_VALUE(content, '$.artifacts[0].parts[0].text') to the COALESCE
chain in all 4 transcript-building SQL queries. A2A_INTERACTION events
(from ADK's RemoteA2aAgent) store the remote agent's response under
artifacts[0].parts[0].text, which was not being extracted — causing
AI.GENERATE to see a question with no answer and return NULL for ~40%
of sessions.

Fixes GoogleCloudPlatform#40

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 15, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@haiyuan-eng-google
Copy link
Copy Markdown
Collaborator

drop the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Categorical evaluation fails for A2A sessions — transcript builder missing A2A_INTERACTION content extraction

2 participants