[NA] [E2E] fix: 3 e2e test failures by AndreiCautisanu · Pull Request #6088 · comet-ml/opik

AndreiCautisanu · 2026-04-06T15:33:35Z

Details

Fixes four systematic E2E test failures:

Online scoring tests (8 tests, 100% fail on local): Flask test helper creates traces without end_time. The backend OnlineScoringSampler filters these out as "partial traces", so scoring never executes. Fixed by adding end_time to all client.trace() calls and .end() to spans.
Flask helper SDK connection for local env: authenticate_if_needed() skips setting OPIK_URL_OVERRIDE for localhost, causing SDK to fall back to ~/.opik.config which may point at a remote server. Fixed by setting OPIK_URL_OVERRIDE in the local auth path.
Experiment items pagination locator: The "Showing X of Y" element changed from a button to a div, breaking getByRole('button', { name: 'Showing' }). Fixed with a text regex locator.
Dataset items via UI-created datasets (100% fail rate): SDK's dataset.get_items() returns 0 items for datasets created without project_name (e.g. via UI). Switched wait-for-items-count and get-items endpoints to use the REST API client which correctly queries by dataset ID.

Also updates Anthropic model names from 4.5 to 4.6 in models_config.yaml.

Change checklist

User facing
Documentation update

Issues

NA — test infrastructure fix

AI-WATERMARK

AI-WATERMARK: yes

Tools: Claude Code
Model(s): Claude Opus 4.6
Scope: Root cause analysis via Allure TestOps + fix implementation
Human verification: Local test execution — all affected tests verified passing

Testing

Online scoring: npx playwright test tests/online-scoring/online-scoring.spec.ts --grep "@sanity" — 7/8 pass (1 transient Sonnet timeout)
Experiment items: npx playwright test tests/experiments/experiment-items.spec.ts — 2/2 pass
Datasets: npx playwright test tests/datasets/datasets.spec.ts — 8/8 pass
Dataset items: npx playwright test tests/datasets/dataset-items.spec.ts — 6/6 pass

All tests run with OPIK_BASE_URL=http://localhost:5173 against local Opik.

Documentation

N/A

github-actions · 2026-04-06T15:33:53Z

📋 PR Linter Failed

❌ Missing Section. The description is missing the ## Details section.

❌ Missing Section. The description is missing the ## Change checklist section.

❌ Missing Section. The description is missing the ## Issues section.

❌ Missing Section. The description is missing the ## Testing section.

❌ Missing Section. The description is missing the ## Documentation section.

tests_end_to_end/test-helper-service/routes/traces.py

… and update Anthropic model names Online scoring E2E tests fail on local runs because the test helper creates traces without end_time. The OnlineScoringSampler skips these as "partial traces", so scoring never executes and the Moderation column never appears. Also updates Anthropic model names from 4.5 to 4.6 in models_config.yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-06T15:44:39Z

📋 PR Linter Failed

❌ Missing Section. The description is missing the ## Details section.

❌ Missing Section. The description is missing the ## Change checklist section.

❌ Missing Section. The description is missing the ## Issues section.

❌ Missing Section. The description is missing the ## Testing section.

❌ Missing Section. The description is missing the ## Documentation section.

…ination locator - Flask test helper skipped setting OPIK_URL_OVERRIDE for localhost, causing SDK to fall back to ~/.opik.config (remote server) and timeout - Experiment items pagination locator used getByRole('button') but the "Showing X of Y" element is a div, not a button Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tests_end_to_end/typescript-tests/page-objects/experiment-items.page.ts

…item visibility SDK's dataset.get_items() returns 0 items for datasets created without a project_name (e.g. via UI). Switch wait-for-items-count and get-items endpoints to use the REST API client which correctly queries by dataset ID. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tests_end_to_end/test-helper-service/routes/datasets.py

…dataset item visibility" This reverts commit 1d39098.

andrescrz

LGTM.

AndreiCautisanu requested review from a team as code owners April 6, 2026 15:33

github-actions bot assigned AndreiCautisanu Apr 6, 2026

github-actions bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. labels Apr 6, 2026

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

tests_end_to_end/test-helper-service/routes/traces.py Show resolved Hide resolved

AndreiCautisanu force-pushed the andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests branch from 4da281b to 63c1f05 Compare April 6, 2026 15:44

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

AndreiCautisanu changed the title ~~[NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names~~ [NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names! Apr 6, 2026

github-actions bot added the typescript *.ts *.tsx label Apr 6, 2026

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

tests_end_to_end/typescript-tests/page-objects/experiment-items.page.ts Show resolved Hide resolved

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

AndreiCautisanu changed the title ~~[NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names!~~ [NA] [E2E] fix: 3 e2e test failures Apr 6, 2026

baz-reviewer bot reviewed Apr 7, 2026

View reviewed changes

tests_end_to_end/test-helper-service/routes/datasets.py Outdated Show resolved Hide resolved

tests_end_to_end/test-helper-service/routes/datasets.py Outdated Show resolved Hide resolved

Revert "fix: use REST API for dataset item queries to fix UI-created …

567ba07

…dataset item visibility" This reverts commit 1d39098.

baz-reviewer bot approved these changes Apr 7, 2026

View reviewed changes

andrescrz approved these changes Apr 7, 2026

View reviewed changes

AndreiCautisanu merged commit 05a9a13 into main Apr 7, 2026
8 of 9 checks passed

AndreiCautisanu deleted the andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests branch April 7, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NA] [E2E] fix: 3 e2e test failures#6088

[NA] [E2E] fix: 3 e2e test failures#6088
AndreiCautisanu merged 4 commits intomainfrom
andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests

AndreiCautisanu commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrescrz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AndreiCautisanu commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Uh oh!

Uh oh!

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrescrz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreiCautisanu commented Apr 6, 2026 •

edited

Loading