[OPIK-5650] [BE/FE] feat: support evaluation suites in playground by itamargolan · Pull Request #6092 · comet-ml/opik

itamargolan · 2026-04-06T18:38:44Z

Details

Adds BE-orchestrated experiment execution for evaluation suites in the playground. When a user runs a dataset that is an evaluation suite, the frontend delegates to a new backend endpoint (POST /v1/private/experiments/execute) instead of doing client-side streaming. The backend creates experiments per prompt variant, processes dataset items asynchronously (LLM calls, trace/span creation), and triggers assertion evaluation via the existing online scoring pipeline.

Key changes:

ExperimentExecutionService orchestrates experiment creation, async item processing, and status transitions
ExperimentItemProcessor handles per-item LLM calls and trace/span/experiment-item creation
EvalSuiteAssertionSampler listens for completed traces and enqueues evaluators for LLM-as-judge scoring with categoryName = "suite_assertion" so results route to assertion_results
OnlineScoringLlmAsJudgeScorer applies optional score-name mapping and category (backward compatible — null values preserve existing behavior)
Frontend shows assertion pass/fail status and pass rate in playground output cells for eval suite runs

Change checklist

User facing
Documentation update

Issues

OPIK-5650

AI-WATERMARK

AI-WATERMARK: yes

If yes:
- Tools: Claude Code
- Model(s): Claude Opus 4.6
- Scope: PR review comment fixes (code style, naming, log patterns, null safety, code deduplication, Javadoc)
- Human verification: yes — all changes reviewed

Testing

cd apps/opik-backend && mvn compile -DskipTests — backend compiles cleanly
cd apps/opik-backend && mvn spotless:apply — formatting passes
Manual testing: run eval suite experiment in playground, verify pass/fail tags and pass rate display
Backward compatibility verified by code inspection: OnlineScoringLlmAsJudgeScorer preserves original score names and null categoryName when scoreNameMapping/categoryName are null (regular online scoring path)

Documentation

N/A — internal backend changes, no public API documentation updates needed.

Demo video

Screen.Recording.2026-04-06.at.23.07.56.mov

Add backend experiment execution endpoint and frontend eval suite flow so playground can run evaluation suite datasets with server-side assertion processing and poll-based progress tracking. Made-with: Cursor

github-actions · 2026-04-06T18:39:02Z

📋 PR Linter Failed

❌ Missing Section. The description is missing the ## Details section.

❌ Missing Section. The description is missing the ## Change checklist section.

❌ Missing Section. The description is missing the ## Testing section.

❌ Missing Section. The description is missing the ## Documentation section.

github-actions · 2026-04-06T18:39:55Z

📋 PR Linter Failed

❌ Missing Section. The description is missing the ## Details section.

❌ Missing Section. The description is missing the ## Change checklist section.

❌ Missing Section. The description is missing the ## Testing section.

❌ Missing Section. The description is missing the ## Documentation section.

github-actions · 2026-04-06T18:45:27Z

Backend Tests - Integration Group 16

242 tests 242 ✅ 5m 57s ⏱️
10 suites 0 💤
10 files 0 ❌

Results for commit eee9991.

♻️ This comment has been updated with latest results.

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentExecutionService.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentItemProcessor.java

...-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSampler.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentItemProcessor.java

apps/opik-frontend/src/api/playground/createLogPlaygroundProcessor.ts

@requiredargsconstructor

- Switch to @requiredargsconstructor convention in EvalSuiteAssertionSampler and ExperimentItemProcessor - Remove SDK references from comments, rename methods (fetchDatasetEvaluators, getMetadataString, toLangChain4jMessage, etc.) - Fix log patterns: pass exception as last param instead of e.getMessage() - Split catch: UncheckedIOException for deserialization, Exception for other errors - Replace generateDeterministicId with IdGenerator.generateId() (UUID v7) - Pre-process evaluators outside trace loop via PreparedEvaluator record - Add dataset version filtering to DatasetItemStreamRequest - Add null validation for datasetId with BadRequestException - Extract buildMessagesInput/buildLlmOutput helpers to deduplicate trace/span creation - Simplify buildTemplateContext using forEach - Add backward-compatibility comment on OnlineScoringLlmAsJudgeScorer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-06T21:39:58Z

Backend Tests - Unit Tests

1 638 tests 1 636 ✅ 1m 1s ⏱️
209 suites 2 💤
209 files 0 ❌

Results for commit 7142907.

♻️ This comment has been updated with latest results.

…trace Collect unique dataset item IDs upfront, fetch and prepare evaluators once per item, then look up from a map inside the trace loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…aluators Pass the userName from TracesCreated event through to the reactive context instead of hardcoding "system". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h and fix unit tests - Rename metadata key to eval_suite_dataset_version_hash across BE, FE, and tests - Fix ExperimentExecutionServiceTest: add datasetId to test requests to match the null-safety validation added earlier - Update test for missing datasetId to assert BadRequestException Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

...-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSampler.java

- Add getItemEvaluatorsByDatasetId to DAO/service for single-query batch fetch of all item evaluators in a dataset version - Refactor EvalSuiteAssertionSampler to use batch fetch instead of per-item reactive calls - Update RunOnDatasetDialog to reflect dataset/evaluation suite choice with dynamic button text and labels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

...-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSampler.java

apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java

apps/opik-frontend/src/v2/pages/PlaygroundPage/RunOnDatasetDialog.tsx

The FE-orchestrated path (createLogPlaygroundProcessor) is only used for regular datasets. Eval suites use the BE-orchestrated path exclusively, so evalSuiteDatasetId, evalSuiteVersionHash, and evaluationMethod fields on LogQueueParams were never set and the related trace metadata / experiment blocks were unreachable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The prefetchItemEvaluators method was missing USER_NAME in contextWrite, causing makeFluxContextAware to throw NoSuchElementException (silently caught), which meant item-level assertions were never calculated. Also fixes test config construction to use getJsonNodeFromString instead of readTree to properly parse evaluator config JSON. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add UUID validation for eval_suite_dataset_item_id in EvalSuiteAssertionSampler - Fix ClickHouse dedup ordering in DatasetItemVersionDAO (filter after LIMIT 1 BY) - Add EXPERIMENT_STATUS enum and use constants instead of string literals - Add two-phase polling (running → evaluating) for eval suite experiments - Extract nested ternary into helper function in RunOnDatasetDialog - Add progress indicator with phase-aware display (running/evaluating) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@builder

Address baz reviewer comment: use Lombok @builder(toBuilder = true) and @nonnull on DatasetEvaluatorsResult and PreparedEvaluator records per project DTO conventions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

...kend/src/test/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSamplerTest.java

.../PlaygroundPage/PlaygroundOutputs/PlaygroundOutputScores/PlaygroundOutputAssertionStatus.tsx

apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java

apps/opik-frontend/src/v2/pages/PlaygroundPage/useActionButtonActions.tsx

...k-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteEvaluatorMapper.java

apps/opik-backend/src/main/java/com/comet/opik/api/events/ExperimentItemToProcess.java

apps/opik-frontend/src/v2/pages/PlaygroundPage/useActionButtonActions.tsx

...-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSampler.java

…stead of UserMessage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove broad catch(Exception) that silently swallowed runtime errors. Keep only UncheckedIOException for deserialization failures and add evaluator config to the log for debuggability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

...k-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteEvaluatorMapper.java

…torMapper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thiagohora

Thanks for addressing all my comments. Only a few remaining issues and I'll approve. Let's address mainly the reactive issues

...src/main/java/com/comet/opik/api/resources/v1/events/ExperimentItemProcessingSubscriber.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentItemProcessor.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentTracePersistence.java

thiagohora

Two minor items from the latest pass.

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentExecutionService.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentTracePersistence.java

…afe score routing Replace implicit string-based categoryName check with a ScoreDestination enum to make score routing (feedback_scores vs assertion_results) explicit and type-safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…code quality - Wrap blocking LLM call in Mono.fromCallable with boundedElastic scheduler - Remove redundant Mono.defer and subscribeOn from subscriber - Add TTL to Redis failure counter to prevent memory leaks - Extract PersistenceContext record to reduce parameter count - Add projectName to ExperimentItem creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sionHash, parallelize trace+span creation - Distinguish user errors (invalid versionHash) from transient DB failures in fetchDatasetExecutionPolicy - Run trace and span creation in parallel via Mono.when() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-09T12:28:57Z

Backend Tests - Integration Group 1

23 files 23 suites 3m 0s ⏱️
413 tests 413 ✅ 0 💤 0 ❌
340 runs 340 ✅ 0 💤 0 ❌

Results for commit 3d28528.

♻️ This comment has been updated with latest results.

baz-reviewer · 2026-04-09T12:29:16Z

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentItemProcessor.java

+    private record LlmCallResult(
+            ChatCompletionResponse response,
+            String errorType,
+            String errorMessage,
+            Instant startTime,
+            Instant endTime) {


LlmCallResult lacks @Builder(toBuilder = true) and is instantiated directly, should we add that annotation and switch instantiations to LlmCallResult.builder()...build()?

new LlmCallResult(...) => LlmCallResult.builder()...build()

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentItemProcessor.java around lines 28-33, the record LlmCallResult is defined without a builder; annotate it with @Builder(toBuilder = true) (and add the Lombok import if missing) so it becomes a proper DTO per the project convention. Then update all instantiations around lines 47-53 (the places returning new LlmCallResult(...)) to use LlmCallResult.builder().response(...).errorType(...).errorMessage(...).startTime(...).endTime(...).build() (set only the fields used in each branch), replacing positional constructors so callers get a toBuilder() hook and comply with the documented pattern.

baz-reviewer · 2026-04-09T12:29:17Z

apps/opik-backend/src/test/java/com/comet/opik/api/ScoreDestinationTest.java

+    @DisplayName("null categoryName resolves to FEEDBACK_SCORES")
+    void nullCategoryResolvesToFeedbackScores() {
+        ScoreDestination destination = SUITE_ASSERTION_CATEGORY.equals(null)
+                ? ScoreDestination.ASSERTION_RESULTS
+                : ScoreDestination.FEEDBACK_SCORES;
+
+        assertThat(destination).isEqualTo(ScoreDestination.FEEDBACK_SCORES);
+    }
+
+    @Test
+    @DisplayName("arbitrary categoryName resolves to FEEDBACK_SCORES")
+    void arbitraryCategoryResolvesToFeedbackScores() {


nullCategoryResolvesToFeedbackScores and arbitraryCategoryResolvesToFeedbackScores duplicate the same assertion — should we collapse them into a single @ParameterizedTest with @NullSource and @ValueSource(strings = "some_other_category")?

@ParameterizedTest
@NullSource
@valuesource(strings = "some_other_category")
void categoryResolvesToFeedbackScores(String category) { /* ... */ }

_{Finding type: Use parameterized tests | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In apps/opik-backend/src/test/java/com/comet/opik/api/ScoreDestinationTest.java around lines 26-43, the methods `nullCategoryResolvesToFeedbackScores` and `arbitraryCategoryResolvesToFeedbackScores` duplicate the same logic (calling SUITE_ASSERTION_CATEGORY.equals(...) and asserting FEEDBACK_SCORES) with only the input literal differing. Replace both tests with a single parameterized test: annotate a new method with @ParameterizedTest, add @NullSource and @ValueSource(strings = "some_other_category"), accept a String parameter for the category, compute the destination the same way, and assert it equals ScoreDestination.FEEDBACK_SCORES. Also add the necessary imports for ParameterizedTest, NullSource, and ValueSource and remove the two original test methods.

apps/opik-backend/src/main/java/com/comet/opik/domain/FeedbackScoreService.java

baz-reviewer · 2026-04-09T12:29:17Z

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentTracePersistence.java

+    @lombok.Builder
+    record PersistenceContext(
+            @NonNull UUID traceId,
+            @NonNull String projectName,
+            @NonNull ExperimentExecutionRequest.PromptVariant prompt,
+            @NonNull List<ExperimentExecutionRequest.PromptVariant.Message> renderedMessages,
+            ChatCompletionResponse llmResponse,
+            String errorType,
+            String errorMessage,
+            @NonNull Instant startTime,
+            @NonNull Instant endTime,
+            @NonNull UUID experimentId,


PersistenceContext uses @lombok.Builder without toBuilder = true — should we switch to @lombok.Builder(toBuilder = true) to match the DTO convention in .agents/skills/opik-backend/SKILL.md?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentTracePersistence.java around lines 38 to 53, the PersistenceContext record is annotated with @lombok.Builder but missing toBuilder = true. Update the annotation to @lombok.Builder(toBuilder = true) on the PersistenceContext record declaration so callers can call toBuilder() to clone/modify instances. Make no other changes to the record.

github-actions · 2026-04-09T12:29:20Z

Python SDK Compatibility V1 E2E Tests Results (Python 3.11)

119 tests ±0 119 ✅ ±0 3m 51s ⏱️ -1s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:29:53Z

Backend Tests - Integration Group 14

242 tests 242 ✅ 8m 17s ⏱️
23 suites 0 💤
23 files 0 ❌

Results for commit 3d28528.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:30:14Z

Python SDK Compatibility V1 E2E Tests Results (Python 3.12)

119 tests ±0 119 ✅ ±0 4m 5s ⏱️ +12s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:30:19Z

Python SDK Compatibility V1 E2E Tests Results (Python 3.14)

119 tests ±0 119 ✅ ±0 3m 57s ⏱️ +5s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:30:24Z

Python SDK Compatibility V1 E2E Tests Results (Python 3.13)

119 tests ±0 119 ✅ ±0 3m 38s ⏱️ ±0s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:30:32Z

Python SDK Compatibility V1 E2E Tests Results (Python 3.10)

119 tests ±0 119 ✅ ±0 3m 52s ⏱️ -1s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:40:20Z

Python SDK E2E Tests Results (Python 3.12)

365 tests ±0 363 ✅ ±0 14m 41s ⏱️ -4s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d6d29-12dc-77db-a092-74dd8ab795d8]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d725c-1e3b-7e5d-835e-0e5d3b608d9a]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:41:03Z

Python SDK E2E Tests Results (Python 3.13)

365 tests ±0 363 ✅ ±0 14m 28s ⏱️ -3s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d6d28-c583-7483-a48d-c6b29ad9a99d]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d725c-56a4-7d6c-8f48-94ff43cf2fa8]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:41:10Z

Python SDK E2E Tests Results (Python 3.14)

365 tests ±0 363 ✅ ±0 14m 12s ⏱️ -30s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d6d28-b8d6-7bba-8c33-1f926d8a2612]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d725c-5c1c-79b6-b7d0-c3de82f345c3]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-09T12:41:22Z

Python SDK E2E Tests Results (Python 3.10)

365 tests ±0 363 ✅ ±0 14m 34s ⏱️ -10s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 3d28528. ± Comparison against base commit bde0715.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d6d28-f32a-7009-a6fa-3b49dc246710]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d725c-66bc-736e-94ae-87e32580cf53]

♻️ This comment has been updated with latest results.

…instead of storing it Make scoreDestination() a derived method on FeedbackScoreItem that computes routing from categoryName, eliminating the stored field. This ensures correct routing for all entry points (JSON API, internal scorer, builder) with a single source of truth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baz-reviewer · 2026-04-09T13:05:17Z

apps/opik-backend/src/main/java/com/comet/opik/api/ScoreDestination.java

+    private static final String SUITE_ASSERTION_CATEGORY = "suite_assertion";
+
+    public static ScoreDestination fromCategoryName(String categoryName) {
+        return SUITE_ASSERTION_CATEGORY.equals(categoryName)
+                ? ASSERTION_RESULTS


SUITE_ASSERTION_CATEGORY is duplicated in ScoreDestination and EvalSuiteAssertionSampler, should we centralize it in ScoreDestination and have the sampler reuse it?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

thiagohora

Thanks for the patience, great work!

[OPIK-5650] [BE/FE] feat: support evaluation suites in playground

9095ce8

Add backend experiment execution endpoint and frontend eval suite flow so playground can run evaluation suite datasets with server-side assertion processing and poll-based progress tracking. Made-with: Cursor

github-actions bot assigned itamargolan Apr 6, 2026

github-actions bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code Frontend Backend tests Including test files, or tests related like configuration. typescript *.ts *.tsx labels Apr 6, 2026

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

itamargolan commented Apr 6, 2026

View reviewed changes

itamargolan and others added 2 commits April 6, 2026 18:40

fix: batch-prefetch item evaluators to avoid N blocking DB calls per …

1fac607

…trace Collect unique dataset item IDs upfront, fetch and prepare evaluators once per item, then look up from a map inside the trace loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: use actual userName instead of hardcoded "system" in fetchItemEv…

e5f15cc

…aluators Pass the userName from TracesCreated event through to the reactive context instead of hardcoding "system". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

...-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteAssertionSampler.java Outdated Show resolved Hide resolved

baz-reviewer bot reviewed Apr 7, 2026

View reviewed changes

itamargolan and others added 3 commits April 6, 2026 21:36

github-actions bot removed the dependencies Pull requests that update a dependency file label Apr 7, 2026

itamargolan marked this pull request as ready for review April 7, 2026 02:59

itamargolan requested a review from a team as a code owner April 7, 2026 02:59

baz-reviewer bot reviewed Apr 7, 2026

View reviewed changes

baz-reviewer bot reviewed Apr 8, 2026

View reviewed changes

itamargolan and others added 2 commits April 8, 2026 15:13

[OPIK-5650] [BE] fix: update test assertion to use OpikUserMessage in…

7142907

…stead of UserMessage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baz-reviewer bot reviewed Apr 8, 2026

View reviewed changes

...k-backend/src/main/java/com/comet/opik/api/resources/v1/events/EvalSuiteEvaluatorMapper.java Outdated Show resolved Hide resolved

[OPIK-5650] [BE] fix: remove config from error log in EvalSuiteEvalua…

00bf6d8

…torMapper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baz-reviewer bot approved these changes Apr 8, 2026

View reviewed changes

thiagohora requested changes Apr 9, 2026

View reviewed changes

thiagohora reviewed Apr 9, 2026

View reviewed changes

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentExecutionService.java Show resolved Hide resolved

apps/opik-backend/src/main/java/com/comet/opik/domain/ExperimentTracePersistence.java Outdated Show resolved Hide resolved

itamargolan and others added 3 commits April 9, 2026 07:30

itamargolan requested a review from thiagohora April 9, 2026 12:24

baz-reviewer bot reviewed Apr 9, 2026

View reviewed changes

thiagohora approved these changes Apr 9, 2026

View reviewed changes

itamargolan merged commit 4dc8015 into main Apr 9, 2026
79 checks passed

itamargolan deleted the itamarg/OPIK-5650/support-eval-suites-in-playground branch April 9, 2026 13:25

CometActions mentioned this pull request Apr 9, 2026

[NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code #6157

Merged

2 tasks

Conversation

itamargolan commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Demo video

Uh oh!

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Uh oh!

github-actions bot commented Apr 6, 2026

📋 PR Linter Failed

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 16

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Unit Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thiagohora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thiagohora left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 1

itamargolan commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

thiagohora left a comment •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading