[NA] [BE] [SDK] Opik Connect by collincunn · Pull Request #6074 · comet-ml/opik

collincunn · 2026-04-04T01:52:09Z

Details

Read docs

Change checklist

User facing
Documentation update

Issues

Testing

new unit and integration tests

Documentation

github-actions · 2026-04-04T01:54:32Z

🌿 Preview your docs: https://opik-preview-ea958085-5ff9-4dc9-a19e-dd326f1af0ac.docs.buildwithfern.com/docs/opik

The following broken links were found:

Page: https://opik-preview-ea958085-5ff9-4dc9-a19e-dd326f1af0ac.docs.buildwithfern.com/docs/opik/integrations/harbor
❌ Broken link: https://github.com/laude-institute/harbor/blob/main/docs/rfcs/0001-trajectory-format.md (404)

Page: https://opik-preview-ea958085-5ff9-4dc9-a19e-dd326f1af0ac.docs.buildwithfern.com/docs/opik/integrations/harbor/
❌ Broken link: https://github.com/laude-institute/harbor/blob/main/docs/rfcs/0001-trajectory-format.md (404)

📌 Results for commit f8b4aee

github-actions · 2026-04-04T01:56:06Z

Backend Tests - Integration Group 12

251 tests 248 ✅ 3m 5s ⏱️
46 suites 3 💤
46 files 0 ❌

Results for commit cc67a16.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T01:56:26Z

Backend Tests - Integration Group 15

235 tests 233 ✅ 3m 39s ⏱️
21 suites 2 💤
21 files 0 ❌

Results for commit c6e0bb1.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T03:04:08Z

Python SDK E2E Tests Results (Python 3.14)

361 tests ±0 359 ✅ ±0 14m 4s ⏱️ -9s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 4a10f06. ± Comparison against base commit d295a1e.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d5284-09f6-74b4-b993-dcf50aef5c53]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d56c5-66c0-7e5f-b4ab-300826724012]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T03:04:08Z

Python SDK E2E Tests Results (Python 3.12)

361 tests ±0 359 ✅ ±0 14m 12s ⏱️ -35s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 4a10f06. ± Comparison against base commit d295a1e.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d5284-05ce-7e28-941b-f9a661920002]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d56c5-593b-7cbb-8412-54a1c148e6fa]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T03:04:49Z

Python SDK E2E Tests Results (Python 3.13)

361 tests ±0 359 ✅ ±0 14m 12s ⏱️ -36s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 4a10f06. ± Comparison against base commit d295a1e.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d5283-9c24-7280-9330-786e9616d942]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d56c5-6ccf-7792-b290-5f06443bb023]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T03:04:57Z

Python SDK E2E Tests Results (Python 3.11)

361 tests ±0 359 ✅ ±0 14m 20s ⏱️ -29s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 4a10f06. ± Comparison against base commit d295a1e.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d5284-0aab-70f3-8b81-8a15b453f13b]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d56c5-442b-76b4-85e6-9b9baa907012]

♻️ This comment has been updated with latest results.

github-actions · 2026-04-04T03:05:21Z

Python SDK E2E Tests Results (Python 3.10)

361 tests ±0 359 ✅ ±0 14m 4s ⏱️ -18s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 4a10f06. ± Comparison against base commit d295a1e.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d5283-b2fe-70d7-9eca-63b1d40fd78e]

tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d56c5-89cd-7c98-baad-1f8db83816dd]

♻️ This comment has been updated with latest results.

Nimrod007

Backend Review

1. N+1 Redis reads in `nextBridgeCommands` (non-blocking)

In LocalRunnerServiceImpl.nextBridgeCommands (~lines 1397-1415): the status batch correctly updates all commands in one round-trip, but then each command is read individually with readAllMap() — that's N extra Redis calls. Since you already have the command IDs and just wrote the status, you could read all commands in the same batch (or a second batch) to cut this to 2 round-trips total instead of N+2.

Not blocking, but worth considering if bridge usage scales — at maxCommands=20 this is 22 Redis calls where 2-3 would do.

2. Missing workspace validation on bridge commands (blocking)

loadValidatedBridgeCommand validates that runnerId matches the command, but doesn't check that the command's workspace_id matches the caller's workspace. The runner ownership check covers it indirectly today (a runner belongs to one workspace), but for defense-in-depth we should validate workspaceId.equals(fields.get(BRIDGE_FIELD_WORKSPACE_ID)) here — same pattern as loadValidatedJob does for jobs.

This applies to all callers: reportBridgeCommandResult, getBridgeCommand, and awaitBridgeCommand. The method signature already receives runnerId but should also take workspaceId and enforce the check.

Nimrod007

Update on review comment #2 (workspace validation)

Softening my earlier "request changes" on the workspace validation point. Looking more closely:

The caller's workspace is validated indirectly through the chain: requestContext.getWorkspaceId() → validateRunnerOwnership(runnerId, workspaceId, userName) → loadValidatedBridgeCommand(runnerId, commandId). Since a runner is always scoped to one workspace, this holds today.

Adding workspaceId.equals(fields.get(BRIDGE_FIELD_WORKSPACE_ID)) in loadValidatedBridgeCommand would be a nice defense-in-depth consistency with loadValidatedJob, but it's not an active vulnerability. Consider it a suggestion, not a blocker.

Nimrod007

Approving — the workspace validation concern is just a suggestion for defense-in-depth, not a blocker. The N+1 Redis reads point is also non-blocking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

StringRedisClient.getList already applies StringCodec.INSTANCE internally, so passing it as a second argument causes a compilation error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

petrotiurin

Will continue the review shortly.

petrotiurin · 2026-04-07T09:48:32Z

        )

-    LOGGER.info("Runner activated")
+    if os.environ.get("OPIK_SUPERVISED") != "true":


nit: we already have supervised local variable that indicates if we're supervised or not, let's use it.

petrotiurin · 2026-04-07T09:50:10Z

+_DEFAULT_COMMAND_TIMEOUT = 30.0
+_MIN_COMMAND_TIMEOUT = 1.0
+_MAX_COMMAND_TIMEOUT = 300.0


Let's add units to those timeouts so it's obvious what they represent. e.g. _MAX_COMMAND_TIMEOUT_SECONDS

petrotiurin · 2026-04-07T10:05:33Z

+        inflight: Set[Future] = set()
+        inflight_lock = threading.Lock()


Do we need to keep track of the futures currently in flight? Can we rely only on semaphore for backpressure? We're not using them anywhere and it just adds additional locking/unlocking.

petrotiurin · 2026-04-07T10:14:51Z

+    def _poll(self) -> List[BridgeCommandItem]:
+        resp = self._api.runners.next_bridge_commands(
+            self._runner_id,
+            max_commands=10,


Let's make max_commands a constant as well.

petrotiurin · 2026-04-07T10:18:49Z

+        on_command_start: Optional[Any] = None,
+        on_command_end: Optional[Any] = None,


This should be at least Optional[Callable], even better with the types it expects, since we rely on a specific interface in L143

petrotiurin · 2026-04-07T10:40:01Z

+        if os.environ.get("OPIK_SUPERVISED") != "true":
+            heartbeat_thread = threading.Thread(
+                target=self._heartbeat_loop,
+                daemon=True,
+            )
+            heartbeat_thread.start()


Why do we need this env var? Would we ever run the runner unsupervised?

petrotiurin · 2026-04-07T10:51:32Z

+_ENTRYPOINT_PATTERNS = [
+    re.compile(r"entrypoint\s*=\s*True"),
+]


for typescript, boolean true is lowercased, true, but follows the same pattern. We should add entrypoint:\s*true here.

petrotiurin · 2026-04-07T10:53:36Z

+_CONFIGURATION_PATTERNS = [
+    re.compile(r"AgentConfig"),
+]


This won't work for typescript as we don't use the same inheritance pattern there. Perhaps it's more straight-forward to check for presence of getAgentConfigVersion/get_agent_config_version calls here?

petrotiurin · 2026-04-07T11:03:14Z

+_HEARTBEAT_INTERVAL = 5.0
+_GRACEFUL_TIMEOUT = 10


Let's add _SECONDS suffix if this is seconds.

petrotiurin · 2026-04-07T11:10:13Z

Can we add some e2e tests here? Just the happy paths so we know all the components work as expected.

This reverts commit a8f43d8.

BorisTkachenko

@collincunn Left some BE related comments

BorisTkachenko · 2026-04-07T12:02:35Z

+                    }
+
+                    return BridgeCommandBatchResponse.builder().commands(items).build();
+                }).subscribeOn(reactor.core.scheduler.Schedulers.boundedElastic()))


Don't use fully qualified Objects, add import

BorisTkachenko · 2026-04-07T12:03:26Z

+        if (depth > DEEP_MERGE_MAX_DEPTH || !base.isObject() || !override.isObject()) {
+            return override;
+        }
+        com.fasterxml.jackson.databind.node.ObjectNode result = base.deepCopy();


Don't use fully qualified Objects, plz add import for it

BorisTkachenko · 2026-04-07T12:06:18Z

        String workspaceId = requestContext.get().getWorkspaceId();
        String userName = requestContext.get().getUserName();
-        LocalRunnerHeartbeatResponse response = runnerService.heartbeat(runnerId, workspaceId, userName);
+        List<String> capabilities = body != null ? body.capabilities() : null;


Not needed here, you are still checking for null in the service layer.

BorisTkachenko · 2026-04-07T12:21:10Z

+            @ApiResponse(responseCode = "404", description = "Runner not found or not connected", content = @Content(schema = @Schema(implementation = ErrorMessage.class))),
+            @ApiResponse(responseCode = "409", description = "Runner does not support bridge", content = @Content(schema = @Schema(implementation = ErrorMessage.class))),
+            @ApiResponse(responseCode = "429", description = "Too many requests", content = @Content(schema = @Schema(implementation = ErrorMessage.class)))})
+    public Response submitBridgeCommand(@PathParam("runnerId") UUID runnerId,


To match existing codebase would be better to rename to createBridgeCommand

BorisTkachenko · 2026-04-07T12:41:44Z

+                    RBatch readBatch = redisClient.createBatch();
+                    List<RFuture<Map<String, String>>> readFutures = new ArrayList<>(commandIds.size());
+                    for (String cmdIdStr : commandIds) {
+                        UUID commandId = UUID.fromString(cmdIdStr);
+                        readFutures.add(readBatch.<String, String>getMap(
+                                bridgeCommandKey(commandId), StringCodec.INSTANCE).readAllMapAsync());
+                    }
+                    readBatch.execute();
+
+                    List<String> liveCommandIds = new ArrayList<>();
+                    List<Map<String, String>> liveFields = new ArrayList<>();
+                    RList<String> activeList = redisClient.getList(activeKey);
+                    for (int i = 0; i < commandIds.size(); i++) {
+                        Map<String, String> fields = readFutures.get(i).toCompletableFuture().join();


RBatch.execute() returns a BatchResult whose getResponses() gives the results in the same order the commands were queued. So you can replace the futures pattern with:

BatchResult<?> batchResult = readBatch.execute(); List<?> responses = batchResult.getResponses(); for (int i = 0; i < commandIds.size(); i++) { @SuppressWarnings("unchecked") Map<String, String> fields = (Map<String, String>) responses.get(i); // ... rest of loop }

This eliminates the RFuture list and the toCompletableFuture().join() calls entirely. The batch is synchronous here anyway (you call execute() and block), so the futures aren't buying you anything.

BorisTkachenko · 2026-04-07T13:14:26Z

+        JsonNode result,
+        JsonNode error,


Probably we want to add here a validation that at least one of these is not null?

petrotiurin · 2026-04-07T13:49:12Z

+def backoff_wait(
+    shutdown_event: threading.Event, backoff: float, cap: float = 30.0
+) -> None:
+    """Sleep with jitter, interruptible by the shutdown event.
+
+    Waits between 50-100% of the backoff value, capped at ``cap`` seconds.
+    """
+    wait = min(backoff, cap) * (0.5 + random.random() * 0.5)
+    shutdown_event.wait(wait)


This does not seem to be "common" and is used only in the bridge loop. Was it meant to be reused in in_process_loop?

petrotiurin · 2026-04-07T14:43:32Z

+class FileMutationQueue:
+    """Per-file lock keyed by realpath. Serializes writes to the same file."""
+
+    def __init__(self) -> None:
+        self._locks: Dict[str, threading.Lock] = {}
+        self._meta_lock = threading.Lock()
+


Would FileMutationRegistry or FileLockRegistry be a better fit here? Having a queue in the name implies we keep track of mutations in this class, which is not the case.

petrotiurin · 2026-04-07T14:53:45Z

+    @patch.dict("os.environ", {"OPIK_SUPERVISED": "true"})
+    def test_supervised__skips_heartbeat_thread(self) -> None:


Not sure I follow why do we want to skip the heartbeat when supervised?

collincunn added 2 commits April 3, 2026 21:49

initial

331ba50

remove scratch pad

8a59523

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Apr 4, 2026

github-actions bot assigned collincunn Apr 4, 2026

github-actions bot added python Pull requests that update Python code java Pull requests that update Java code Backend tests Including test files, or tests related like configuration. typescript *.ts *.tsx Python SDK TypeScript SDK labels Apr 4, 2026

baz-reviewer bot reviewed Apr 4, 2026

View reviewed changes

collincunn added 4 commits April 3, 2026 22:14

fixes

31ab58f

linting

20b9b64

linting

d23ba76

adding tui

bcc1b52

baz-reviewer bot reviewed Apr 4, 2026

View reviewed changes

collincunn commented Apr 4, 2026

View reviewed changes

pr comments

f2f1460

baz-reviewer bot reviewed Apr 4, 2026

View reviewed changes

Comment thread sdks/python/src/opik/runner/bridge_handlers/read_file.py Outdated

Comment thread sdks/python/src/opik/runner/bridge_handlers/common.py

pr comments

8a86592

bugs

cc67a16

github-actions bot added the Frontend label Apr 6, 2026

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

background bash

c6c50bc

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

Comment thread sdks/python/src/opik/runner/bridge_handlers/exec_command.py

Comment thread sdks/python/tests/unit/runner/test_bridge_handlers.py

naked opik connect works now

629516f

Nimrod007 requested changes Apr 6, 2026

View reviewed changes

Nimrod007 reviewed Apr 6, 2026

View reviewed changes

Nimrod007 previously approved these changes Apr 6, 2026

View reviewed changes

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

ci failure

8d7ad98

collincunn dismissed Nimrod007’s stale review via 8d7ad98 April 6, 2026 20:25

[NA] [FE] Reset frontend files to match main

6defadb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

baz-reviewer bot reviewed Apr 6, 2026

View reviewed changes

Comment thread apps/opik-backend/src/main/java/com/comet/opik/domain/LocalRunnerService.java Outdated

Comment thread apps/opik-backend/src/main/java/com/comet/opik/domain/LocalRunnerService.java

Comment thread apps/opik-backend/src/main/java/com/comet/opik/domain/LocalRunnerService.java

github-actions bot removed the Frontend label Apr 6, 2026

race condition fix

70171a8

baz-reviewer bot approved these changes Apr 6, 2026

View reviewed changes

Fix StringRedisClient.getList call - remove extra codec argument

c6ca409

StringRedisClient.getList already applies StringCodec.INSTANCE internally, so passing it as a second argument causes a compilation error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot assigned Lothiraldan Apr 7, 2026

petrotiurin reviewed Apr 7, 2026

View reviewed changes

Nimrod007 merged commit a8f43d8 into main Apr 7, 2026
182 of 187 checks passed

Nimrod007 deleted the collinc/bridge branch April 7, 2026 12:12

Nimrod007 added a commit that referenced this pull request Apr 7, 2026

Revert "[NA] [BE] [SDK] Opik Connect (#6074)"

744c34f

This reverts commit a8f43d8.

Nimrod007 mentioned this pull request Apr 7, 2026

Revert "[NA] [BE] [SDK] Opik Connect" #6103

Merged

Nimrod007 added a commit that referenced this pull request Apr 7, 2026

Revert "[NA] [BE] [SDK] Opik Connect (#6074)" (#6103)

4f3571a

This reverts commit a8f43d8.

Nimrod007 restored the collinc/bridge branch April 7, 2026 12:14

CometActions mentioned this pull request Apr 7, 2026

[NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code #6104

Closed

2 tasks

BorisTkachenko reviewed Apr 7, 2026

View reviewed changes

petrotiurin reviewed Apr 7, 2026

View reviewed changes

		inflight: Set[Future] = set()
		inflight_lock = threading.Lock()

		on_command_start: Optional[Any] = None,
		on_command_end: Optional[Any] = None,

		@patch.dict("os.environ", {"OPIK_SUPERVISED": "true"})
		def test_supervised__skips_heartbeat_thread(self) -> None:

		_HEARTBEAT_INTERVAL = 5.0
		_GRACEFUL_TIMEOUT = 10

Conversation

collincunn commented Apr 4, 2026

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 12

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 15

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python SDK E2E Tests Results (Python 3.14)

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python SDK E2E Tests Results (Python 3.12)

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python SDK E2E Tests Results (Python 3.13)

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python SDK E2E Tests Results (Python 3.11)

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python SDK E2E Tests Results (Python 3.10)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nimrod007 left a comment

Choose a reason for hiding this comment

Backend Review

1. N+1 Redis reads in nextBridgeCommands (non-blocking)

2. Missing workspace validation on bridge commands (blocking)

Uh oh!

Nimrod007 left a comment

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

1. N+1 Redis reads in `nextBridgeCommands` (non-blocking)