chore(testing): extended E2E benchmark session runner by saschabuehrle · Pull Request #106 · lemony-ai/cascadeflow

saschabuehrle · 2026-02-10T19:07:59Z

Adds an extended real-API testing/benchmarking session setup:

New guide: docs/guides/extended-testing.md
Updated benchmark README with correct commands
tests/benchmarks/run_all.py: adds --profile presets (smoke/standard/overnight/full), optional provider comparison, and limits BFCL agentic task count
New benchmark: tests/benchmarks/agentic_multi_agent.py (router + tool-calling correctness + cost reduction)
One-command session runner: scripts/extended-e2e-session.sh (logs + JSON artifacts)
TypeScript real API smoke: pnpm -C packages/core run real-api:smoke

chore(testing): add extended E2E benchmark session runner

9315c59

github-actions Bot added documentation Improvements or additions to documentation lang: typescript lang: python tests size/l labels Feb 10, 2026

saschabuehrle merged commit 059bd3d into main Feb 18, 2026
24 checks passed

saschabuehrle deleted the chore/extended-e2e-benchmarking branch February 18, 2026 20:29

Provide feedback