Skip to content

[NA] [E2E] fix: 3 e2e test failures#6088

Merged
AndreiCautisanu merged 4 commits intomainfrom
andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests
Apr 7, 2026
Merged

[NA] [E2E] fix: 3 e2e test failures#6088
AndreiCautisanu merged 4 commits intomainfrom
andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests

Conversation

@AndreiCautisanu
Copy link
Copy Markdown
Contributor

@AndreiCautisanu AndreiCautisanu commented Apr 6, 2026

Details

Fixes four systematic E2E test failures:

  1. Online scoring tests (8 tests, 100% fail on local): Flask test helper creates traces without end_time. The backend OnlineScoringSampler filters these out as "partial traces", so scoring never executes. Fixed by adding end_time to all client.trace() calls and .end() to spans.

  2. Flask helper SDK connection for local env: authenticate_if_needed() skips setting OPIK_URL_OVERRIDE for localhost, causing SDK to fall back to ~/.opik.config which may point at a remote server. Fixed by setting OPIK_URL_OVERRIDE in the local auth path.

  3. Experiment items pagination locator: The "Showing X of Y" element changed from a button to a div, breaking getByRole('button', { name: 'Showing' }). Fixed with a text regex locator.

  4. Dataset items via UI-created datasets (100% fail rate): SDK's dataset.get_items() returns 0 items for datasets created without project_name (e.g. via UI). Switched wait-for-items-count and get-items endpoints to use the REST API client which correctly queries by dataset ID.

Also updates Anthropic model names from 4.5 to 4.6 in models_config.yaml.

Change checklist

  • User facing
  • Documentation update

Issues

  • NA — test infrastructure fix

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.6
  • Scope: Root cause analysis via Allure TestOps + fix implementation
  • Human verification: Local test execution — all affected tests verified passing

Testing

  • Online scoring: npx playwright test tests/online-scoring/online-scoring.spec.ts --grep "@sanity" — 7/8 pass (1 transient Sonnet timeout)
  • Experiment items: npx playwright test tests/experiments/experiment-items.spec.ts — 2/2 pass
  • Datasets: npx playwright test tests/datasets/datasets.spec.ts — 8/8 pass
  • Dataset items: npx playwright test tests/datasets/dataset-items.spec.ts — 6/6 pass

All tests run with OPIK_BASE_URL=http://localhost:5173 against local Opik.

Documentation

N/A

@AndreiCautisanu AndreiCautisanu requested review from a team as code owners April 6, 2026 15:33
@github-actions github-actions bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

… and update Anthropic model names

Online scoring E2E tests fail on local runs because the test helper creates
traces without end_time. The OnlineScoringSampler skips these as "partial
traces", so scoring never executes and the Moderation column never appears.

Also updates Anthropic model names from 4.5 to 4.6 in models_config.yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AndreiCautisanu AndreiCautisanu force-pushed the andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests branch from 4da281b to 63c1f05 Compare April 6, 2026 15:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

@AndreiCautisanu AndreiCautisanu changed the title [NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names [NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names! Apr 6, 2026
…ination locator

- Flask test helper skipped setting OPIK_URL_OVERRIDE for localhost,
  causing SDK to fall back to ~/.opik.config (remote server) and timeout
- Experiment items pagination locator used getByRole('button') but the
  "Showing X of Y" element is a div, not a button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the typescript *.ts *.tsx label Apr 6, 2026
@AndreiCautisanu AndreiCautisanu changed the title [NA] [E2E] fix: set end_time on test helper traces for online scoring and update Anthropic model names! [NA] [E2E] fix: 3 e2e test failures Apr 6, 2026
…item visibility

SDK's dataset.get_items() returns 0 items for datasets created without a
project_name (e.g. via UI). Switch wait-for-items-count and get-items
endpoints to use the REST API client which correctly queries by dataset ID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@AndreiCautisanu AndreiCautisanu merged commit 05a9a13 into main Apr 7, 2026
8 of 9 checks passed
@AndreiCautisanu AndreiCautisanu deleted the andreicautisanu/OPIK-NA-fix-online-scoring-e2e-tests branch April 7, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update Python code tests Including test files, or tests related like configuration. typescript *.ts *.tsx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants