[NA] [E2E] fix: 3 e2e test failures#6088
Merged
AndreiCautisanu merged 4 commits intomainfrom Apr 7, 2026
Merged
Conversation
Contributor
📋 PR Linter Failed❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the |
… and update Anthropic model names Online scoring E2E tests fail on local runs because the test helper creates traces without end_time. The OnlineScoringSampler skips these as "partial traces", so scoring never executes and the Moderation column never appears. Also updates Anthropic model names from 4.5 to 4.6 in models_config.yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4da281b to
63c1f05
Compare
Contributor
📋 PR Linter Failed❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the ❌ Missing Section. The description is missing the |
…ination locator
- Flask test helper skipped setting OPIK_URL_OVERRIDE for localhost,
causing SDK to fall back to ~/.opik.config (remote server) and timeout
- Experiment items pagination locator used getByRole('button') but the
"Showing X of Y" element is a div, not a button
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…item visibility SDK's dataset.get_items() returns 0 items for datasets created without a project_name (e.g. via UI). Switch wait-for-items-count and get-items endpoints to use the REST API client which correctly queries by dataset ID. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dataset item visibility" This reverts commit 1d39098.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
Fixes four systematic E2E test failures:
Online scoring tests (8 tests, 100% fail on local): Flask test helper creates traces without
end_time. The backendOnlineScoringSamplerfilters these out as "partial traces", so scoring never executes. Fixed by addingend_timeto allclient.trace()calls and.end()to spans.Flask helper SDK connection for local env:
authenticate_if_needed()skips settingOPIK_URL_OVERRIDEfor localhost, causing SDK to fall back to~/.opik.configwhich may point at a remote server. Fixed by settingOPIK_URL_OVERRIDEin the local auth path.Experiment items pagination locator: The "Showing X of Y" element changed from a button to a div, breaking
getByRole('button', { name: 'Showing' }). Fixed with a text regex locator.Dataset items via UI-created datasets (100% fail rate): SDK's
dataset.get_items()returns 0 items for datasets created withoutproject_name(e.g. via UI). Switchedwait-for-items-countandget-itemsendpoints to use the REST API client which correctly queries by dataset ID.Also updates Anthropic model names from 4.5 to 4.6 in
models_config.yaml.Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
npx playwright test tests/online-scoring/online-scoring.spec.ts --grep "@sanity"— 7/8 pass (1 transient Sonnet timeout)npx playwright test tests/experiments/experiment-items.spec.ts— 2/2 passnpx playwright test tests/datasets/datasets.spec.ts— 8/8 passnpx playwright test tests/datasets/dataset-items.spec.ts— 6/6 passAll tests run with
OPIK_BASE_URL=http://localhost:5173against local Opik.Documentation
N/A