Skip to content

chore(testing): extended E2E benchmark session runner#106

Merged
saschabuehrle merged 1 commit intomainfrom
chore/extended-e2e-benchmarking
Feb 18, 2026
Merged

chore(testing): extended E2E benchmark session runner#106
saschabuehrle merged 1 commit intomainfrom
chore/extended-e2e-benchmarking

Conversation

@saschabuehrle
Copy link
Copy Markdown
Collaborator

Adds an extended real-API testing/benchmarking session setup:

  • New guide: docs/guides/extended-testing.md
  • Updated benchmark README with correct commands
  • tests/benchmarks/run_all.py: adds --profile presets (smoke/standard/overnight/full), optional provider comparison, and limits BFCL agentic task count
  • New benchmark: tests/benchmarks/agentic_multi_agent.py (router + tool-calling correctness + cost reduction)
  • One-command session runner: scripts/extended-e2e-session.sh (logs + JSON artifacts)
  • TypeScript real API smoke: pnpm -C packages/core run real-api:smoke

@saschabuehrle saschabuehrle merged commit 059bd3d into main Feb 18, 2026
24 checks passed
@saschabuehrle saschabuehrle deleted the chore/extended-e2e-benchmarking branch February 18, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant