Skip to content

fix: run integration tests in Docker for clean process isolation#703

Closed
marvin-bot-coasys wants to merge 14 commits intodevfrom
fix/docker-ci-isolation
Closed

fix: run integration tests in Docker for clean process isolation#703
marvin-bot-coasys wants to merge 14 commits intodevfrom
fix/docker-ci-isolation

Conversation

@marvin-bot-coasys
Copy link
Contributor

@marvin-bot-coasys marvin-bot-coasys commented Mar 1, 2026

Summary

Runs build and integration tests inside the existing coasys/ad4m-ci-linux Docker container for clean process isolation.

Root cause of all previous test flakiness: self-hosted machine runner reuses the workdir, leaving orphaned executor processes that conflict with the next run.

Solution: wrap every build and test step in docker run --rm. Container exit kills all child processes — no orphaned executors, no port conflicts, no complex teardown.

Changes

CI (.circleci/config.yml)

  • build-and-test: entire build now runs inside docker run coasys/ad4m-ci-linux (as root with HOME=/home/circleci to use circleci user tools). Binaries stashed to host after container exits.
  • integration-tests-js, integration-tests-multi-user-simple, integration-tests-email-verification: test command wrapped in docker run --rm --network host coasys/ad4m-ci-linux
  • No new Docker image needed — uses the existing image throughout

Rust executor fixes

  • agent/mod.rs, db.rs, runtime_service/mod.rs: recover from mutex PoisonError instead of panicking (cascade fix)
  • agent/mod.rs load(): profile errors fall through to default agent instead of panicking

Test code

Summary by CodeRabbit

  • Chores
    • Reworked CI to run build, test, and integration steps inside a dedicated container image for more consistent, reproducible pipelines.
    • Consolidated environment setup and artifact handling, added explicit cleanup of container-owned artifacts, and moved most test execution into containerized jobs with host networking where needed.

…nicking

A single panic while holding the AgentService, Ad4mDb, or RuntimeService
mutex poisoned it, causing every subsequent GraphQL request to panic with
'Couldn't get lock on Ad4mDb: PoisonError'. This cascaded into killing
the entire conductor mid-test.

Fixes:
- agent/mod.rs: with_global_instance + with_mutable_global_instance now
  recover from PoisonError via unwrap_or_else(|e| e.into_inner()) instead
  of calling .expect() which panics on poisoned locks.
- agent/mod.rs: load() no longer panics on file read/parse errors — logs
  and returns early instead, preventing the initial mutex poison.
- db.rs: Ad4mDb::with_global_instance same PoisonError recovery.
- runtime_service/mod.rs: RuntimeService::with_global_instance same fix.

These were the root cause of integration test failures where 'conductors
were killed while they shouldn't be' — the conductor didn't receive a
kill signal, it died from a panic cascade triggered by one bad request.
CodeRabbit caught that returning early on profile read/parse error left
self.agent = None, which causes downstream store_agent_profile() callers
that call .expect("agent") to panic — exactly what we were trying to
prevent. Fix: restructure so errors log and fall through to the existing
default-agent creation path, ensuring self.agent is always initialized.
- Add Dockerfile.test (Ubuntu 25.04 + Node 18 + pnpm + Deno)
  Ubuntu 25.04 required: binaries built on host use glibc 2.39
  (Ubuntu 22.04 only has glibc 2.35 — incompatible)
  Image: coasys/ad4m-ci-test:latest@sha256:de8f691544f7c2f3b0f919fd870e0161c7f025d8169e45ed68b51f60c2b58a96

- integration-tests-js: replace direct execution with 'docker run --rm'
- integration-tests-multi-user-simple: same
- integration-tests-email-verification: same

When the container exits (pass or fail) ALL processes inside die.
No orphaned executors. No port conflicts between re-runs.
No complex teardown logic needed in test code.

The workspace (pnpm node_modules, built languages, executor binaries)
is bind-mounted read-write so no re-install inside the container.
--network host so executor localhost ports are reachable from tests.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors CircleCI to run build, test, and integration workflows inside the coasys/ad4m-ci-linux Docker image; replaces host-side commands and port/executor cleanup with containerized docker run executions and consolidates environment/setup descriptions.

Changes

Cohort / File(s) Summary
CI Configuration
.circleci/config.yml
Reworked CircleCI to run build, test, and integration steps inside the coasys/ad4m-ci-linux Docker container. Consolidated setup descriptions, replaced host cargo/pnpm/build/clippy steps with single Docker runs, added cleanup of root-owned artifacts, and containerized integration test jobs (host networking, workspace mounts, user/group mirroring).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • lucksus

Poem

🐰 I hopped into builds with a curious stare,
Found Docker nests cozy and tidy and fair,
Tests tucked in containers, all snug in a line,
No orphaned ports dancing — CI hums fine! 🐳✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: running integration tests in Docker for process isolation. This directly addresses the primary change in the PR—moving tests into Docker containers for isolation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/docker-ci-isolation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
.circleci/config.yml (1)

4-7: Consider using the YAML anchor to avoid hardcoding the image digest.

The ad4m_test_image anchor is defined at line 7 but the full coasys/ad4m-ci-test:latest@sha256:... string is repeated in each docker run command. If the image digest changes, all three places need updating.

Unfortunately, YAML anchors can't be interpolated inside shell strings, but you could use an environment variable or a reusable command:

Option: Define image as env var in a command
commands:
  run_in_test_container:
    parameters:
      script:
        type: string
    steps:
      - run:
          command: |
            docker run --rm \
              --network host \
              -v "$(pwd):/workspace" \
              -w /workspace \
              coasys/ad4m-ci-test:latest@sha256:de8f691544f7c2f3b0f919fd870e0161c7f025d8169e45ed68b51f60c2b58a96 \
              bash -c "<< parameters.script >>"

Then each job step becomes:

- run_in_test_container:
    script: "cd tests/js && pnpm run test-main"

Also applies to: 121-126, 137-142, 153-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml around lines 4 - 7, The ad4m_test_image anchor is
defined but its full image digest is repeatedly hardcoded in multiple docker run
invocations; replace those repeated strings by adding a reusable CircleCI
command (e.g., run_in_test_container with a script parameter) or an environment
variable that references the ad4m_test_image value, then update each place that
currently runs the docker command to call run_in_test_container (passing the
bash script via the script parameter) or use the env var; use the anchor name
ad4m_test_image when defining the command/env and the command name
run_in_test_container (and its script parameter) to locate and replace the
duplicated docker run usage across the three occurrences.
.circleci/Dockerfile.test (2)

8-18: Add --no-install-recommends to reduce image size.

The apt-get install is missing --no-install-recommends, which pulls in unnecessary suggested packages, increasing image size and attack surface.

Proposed fix
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     curl \
     git \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/Dockerfile.test around lines 8 - 18, Update the RUN line that
performs apt-get install so it uses --no-install-recommends to avoid pulling
suggested packages: modify the RUN command containing "apt-get update && apt-get
install -y \ curl \ git \ bash \ fuse \ libfuse2t64 \ ca-certificates \ lsof \
unzip \ xz-utils" to include --no-install-recommends (e.g. "apt-get install -y
--no-install-recommends") and keep the existing "rm -rf /var/lib/apt/lists/*"
cleanup.

24-28: Hardcoded Node version in PATH may drift from installed version.

nvm install 18 installs the latest Node 18.x, but line 28 hardcodes v18.20.8. If the latest 18.x changes, the PATH entry becomes stale and node won't be found.

Consider pinning the exact version in the install command or dynamically resolving the path:

Option 1: Pin exact version during install
-    && nvm install 18 \
-    && nvm alias default 18 \
+    && nvm install 18.20.8 \
+    && nvm alias default 18.20.8 \
Option 2: Dynamically resolve path
-ENV PATH="$NVM_DIR/versions/node/v18.20.8/bin:$PATH"
+# Resolve the default node version's bin directory dynamically
+RUN ln -s "$(. "$NVM_DIR/nvm.sh" && dirname "$(which node)")" /usr/local/node-bin
+ENV PATH="/usr/local/node-bin:$PATH"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/Dockerfile.test around lines 24 - 28, The Dockerfile hardcodes ENV
PATH to v18.20.8 while using `nvm install 18`, which can diverge; either pin the
exact Node version in the install/alias steps or make the PATH reference the
same resolved version. Update the `nvm install`/`nvm alias default` lines to the
exact version string used in PATH (e.g., `nvm install v18.20.8` and `nvm alias
default v18.20.8`) so `ENV PATH="$NVM_DIR/versions/node/v18.20.8/bin:$PATH"`
stays correct, or change ENV PATH to derive the version consistently (use the
same variable or command output that `nvm install`/`nvm alias` sets) so the PATH
and installed Node version cannot drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.circleci/config.yml:
- Around line 4-7: The ad4m_test_image anchor is defined but its full image
digest is repeatedly hardcoded in multiple docker run invocations; replace those
repeated strings by adding a reusable CircleCI command (e.g.,
run_in_test_container with a script parameter) or an environment variable that
references the ad4m_test_image value, then update each place that currently runs
the docker command to call run_in_test_container (passing the bash script via
the script parameter) or use the env var; use the anchor name ad4m_test_image
when defining the command/env and the command name run_in_test_container (and
its script parameter) to locate and replace the duplicated docker run usage
across the three occurrences.

In @.circleci/Dockerfile.test:
- Around line 8-18: Update the RUN line that performs apt-get install so it uses
--no-install-recommends to avoid pulling suggested packages: modify the RUN
command containing "apt-get update && apt-get install -y \ curl \ git \ bash \
fuse \ libfuse2t64 \ ca-certificates \ lsof \ unzip \ xz-utils" to include
--no-install-recommends (e.g. "apt-get install -y --no-install-recommends") and
keep the existing "rm -rf /var/lib/apt/lists/*" cleanup.
- Around line 24-28: The Dockerfile hardcodes ENV PATH to v18.20.8 while using
`nvm install 18`, which can diverge; either pin the exact Node version in the
install/alias steps or make the PATH reference the same resolved version. Update
the `nvm install`/`nvm alias default` lines to the exact version string used in
PATH (e.g., `nvm install v18.20.8` and `nvm alias default v18.20.8`) so `ENV
PATH="$NVM_DIR/versions/node/v18.20.8/bin:$PATH"` stays correct, or change ENV
PATH to derive the version consistently (use the same variable or command output
that `nvm install`/`nvm alias` sets) so the PATH and installed Node version
cannot drift.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e94f4b9 and 186c83c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .circleci/Dockerfile.test
  • .circleci/config.yml
  • rust-executor/src/agent/mod.rs
  • rust-executor/src/db.rs
  • rust-executor/src/runtime_service/mod.rs

Per Nico's direction: use the existing CI image for everything, not a new
Ubuntu 25.04 test image. Running builds inside the same Ubuntu 22.04 container
ensures binary compatibility between build and test environments.

- build-and-test: entire build runs inside 'docker run coasys/ad4m-ci-linux'
  (runs as root with HOME=/home/circleci to access circleci user's pre-installed
  tools: cargo, pnpm, deno, node). Binaries stashed after container exits.
- integration-tests-js/multi-user/email-verification: tests run inside
  'docker run --rm coasys/ad4m-ci-linux --network host'
- Remove Dockerfile.test (Ubuntu 25.04 test image no longer needed)

Key benefit: container exit kills all child processes. No orphaned executors
between test re-runs. No complex teardown logic needed.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.circleci/config.yml (1)

28-48: ⚠️ Potential issue | 🟠 Major

Add stale ad4m-executor cleanup before host-network integration tests.

--rm only cleans processes started by that container. With --network host, any already-running host ad4m-executor can still hold ports and reintroduce flakiness. Add a shared cleanup step in setup_integration_test_environment.

Suggested change
   setup_integration_test_environment:
     description: "Checkout + restore binaries + install deps + build languages (on host)"
     steps:
       - checkout
       - run:
           name: Restore binaries from local stash
           command: |
             mkdir -p target/release
             cp /tmp/circleci-artifacts/${CIRCLE_WORKFLOW_ID}/ad4m target/release/ad4m
             cp /tmp/circleci-artifacts/${CIRCLE_WORKFLOW_ID}/ad4m-executor target/release/ad4m-executor
+      - run:
+          name: Kill lingering ad4m-executor on host
+          command: pkill -f ad4m-executor || true
       - setup_env

Based on learnings: "Kill any lingering ad4m-executor processes before running integration tests to avoid port conflicts".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml around lines 28 - 48, In
setup_integration_test_environment add a dedicated run step that force-stops any
lingering host-side ad4m-executor processes before restoring binaries and
running host-network integration tests; locate the job named
setup_integration_test_environment and insert a step that gracefully kills or
pkills processes matching ad4m-executor (or stops any host containers/processes
that hold those ports) so stale ad4m-executor instances started outside the test
container with --network host cannot block ports.
🧹 Nitpick comments (1)
.circleci/config.yml (1)

3-3: Use the existing *ad4m_image alias instead of repeating the image literal.

This avoids digest drift across jobs and reduces update mistakes.

Suggested change
-              coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c \
+              *ad4m_image \

Also applies to: 71-71, 110-110, 130-130, 150-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml at line 3, Several jobs repeat the full image literal
instead of using the existing YAML anchor alias ad4m_image; replace each
repeated image literal with the YAML alias *ad4m_image (use the anchor name
ad4m_image and the alias *ad4m_image) so all jobs reference the single anchored
image value and avoid digest drift across jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.circleci/config.yml:
- Around line 64-71: The Docker invocations currently pass the flag "--user
root" which causes bind-mounted files to be created as root on the host; replace
the "--user root" token in each docker run command with "--user \"$(id -u):$(id
-g)\"" so the container runs as the calling user's UID:GID; update all four
occurrences of the docker run command (the lines containing 'docker run --rm \
--user root \ -e HOME=/home/circleci \ -e CARGO_HOME=...' etc.) to use the
id-based user string to avoid root-owned files on the host.

---

Outside diff comments:
In @.circleci/config.yml:
- Around line 28-48: In setup_integration_test_environment add a dedicated run
step that force-stops any lingering host-side ad4m-executor processes before
restoring binaries and running host-network integration tests; locate the job
named setup_integration_test_environment and insert a step that gracefully kills
or pkills processes matching ad4m-executor (or stops any host
containers/processes that hold those ports) so stale ad4m-executor instances
started outside the test container with --network host cannot block ports.

---

Nitpick comments:
In @.circleci/config.yml:
- Line 3: Several jobs repeat the full image literal instead of using the
existing YAML anchor alias ad4m_image; replace each repeated image literal with
the YAML alias *ad4m_image (use the anchor name ad4m_image and the alias
*ad4m_image) so all jobs reference the single anchored image value and avoid
digest drift across jobs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 186c83c and 5745cc3.

📒 Files selected for processing (1)
  • .circleci/config.yml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.circleci/config.yml (1)

64-66: ⚠️ Potential issue | 🟠 Major

Use host UID:GID instead of --user root for bind-mounted workspace writes.

With -v "$(pwd):/workspace", running as root creates root-owned files on the host workdir, which can break later checkout/cleanup on a reused self-hosted runner.

Suggested change
-              --user root \
+              --user "$(id -u):$(id -g)" \

Apply to all four Docker invocations.

#!/bin/bash
set -euo pipefail
echo "Expect 0 matches after fix:"
rg -n -- '--user root' .circleci/config.yml || true
echo
echo "Docker bind-mount invocations:"
rg -n -- 'docker run --rm|-v "\$\(pwd\):/workspace"' .circleci/config.yml

Also applies to: 103-106, 124-127, 145-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml around lines 64 - 66, Replace the hardcoded "--user
root" on each Docker invocation that bind-mounts the workspace (the lines
invoking "docker run --rm" together with '-v "$(pwd):/workspace"') with the host
caller's UID:GID so files created in /workspace are owned by the host user; e.g.
use --user "$(id -u):$(id -g)" (ensure it's evaluated in the shell where Docker
is run) and apply this change to all four docker run invocations that currently
use "--user root".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.circleci/config.yml:
- Around line 64-66: Replace the hardcoded "--user root" on each Docker
invocation that bind-mounts the workspace (the lines invoking "docker run --rm"
together with '-v "$(pwd):/workspace"') with the host caller's UID:GID so files
created in /workspace are owned by the host user; e.g. use --user "$(id -u):$(id
-g)" (ensure it's evaluated in the shell where Docker is run) and apply this
change to all four docker run invocations that currently use "--user root".

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5745cc3 and a1c391a.

📒 Files selected for processing (3)
  • .circleci/config.yml
  • rust-executor/src/agent/mod.rs
  • rust-executor/src/runtime_service/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rust-executor/src/agent/mod.rs

…urity issue

- Revert rust-executor/src/agent/mod.rs, db.rs, runtime_service/mod.rs, Cargo.lock
  to dev state — Rust PoisonError fixes are a separate concern and do not belong
  in this PR which is CI-only

- Replace '--user root' with '--user $(id -u):$(id -g) --group-add 1002':
  - Runs container as the host's actual user (marvin, UID 1000)
  - Adds circleci group (GID 1002) so /home/circleci tools remain accessible
    (/home/circleci is drwxr-x--- owned by circleci:circleci 1001:1002)
  - Files written to the bind-mounted workspace are owned by the host user,
    not root — resolves the CodeRabbit security concern
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.circleci/config.yml (1)

160-180: ⚠️ Potential issue | 🟡 Minor

Add explicit cleanup before integration tests to kill lingering processes from previous jobs.

The integration tests spawn background ad4m processes (which spawn ad4m-executor separately), but the CircleCI config has no pre-test cleanup step. Although teardown_file() runs killall ad4m, this may not fully clean up ad4m-executor. On the self-hosted runner, orphaned processes from previous jobs can cause port conflicts and test failures.

Add a cleanup step before running the bats tests:

- run:
    name: Kill lingering processes from previous jobs
    command: pkill -9 -f ad4m-executor || true; pkill -9 -f "^./target/release/ad4m" || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml around lines 160 - 180, Add a pre-test cleanup step in
the integration-tests-cli job to kill lingering ad4m and ad4m-executor processes
before running the bats tests: insert a run step (before the "Run binaries test
script" and "Run integration test script" steps) that executes pkill for
ad4m-executor and the ad4m binary to avoid port conflicts from orphaned
processes; reference the integration-tests-cli job and the bats test steps
(commands invoking ./tests/bats/bin/bats tests/binaries.bats and
tests/integration.bats) when placing this new cleanup run step.
🧹 Nitpick comments (1)
.circleci/config.yml (1)

72-72: Consider avoiding image reference duplication.

The full image reference coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c is repeated in all four docker run commands (lines 72, 113, 135, 157), while line 3 defines a YAML anchor &ad4m_image that isn't used.

Since YAML anchors don't work inside shell strings, consider defining the image as an environment variable at the job level or in a preceding step, then referencing it in the docker commands.

Example using environment variable
  build-and-test:
    machine: true
    resource_class: coasys/marvin
+   environment:
+     AD4M_CI_IMAGE: coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c
    steps:
      - checkout
      ...
          command: |
            docker run --rm \
              ...
-             coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c \
+             $AD4M_CI_IMAGE \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.circleci/config.yml at line 72, The repeated hardcoded image string
(coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c)
should be centralized: stop relying on the YAML anchor &ad4m_image inside shell
strings and instead define a single environment variable (e.g., AD4M_IMAGE) at
the job level or export it in an earlier step, then update each docker run
invocation to reference that variable (e.g., $AD4M_IMAGE) so all four docker run
uses (the lines currently repeating the full image ref) use the single source of
truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.circleci/config.yml:
- Around line 160-180: Add a pre-test cleanup step in the integration-tests-cli
job to kill lingering ad4m and ad4m-executor processes before running the bats
tests: insert a run step (before the "Run binaries test script" and "Run
integration test script" steps) that executes pkill for ad4m-executor and the
ad4m binary to avoid port conflicts from orphaned processes; reference the
integration-tests-cli job and the bats test steps (commands invoking
./tests/bats/bin/bats tests/binaries.bats and tests/integration.bats) when
placing this new cleanup run step.

---

Nitpick comments:
In @.circleci/config.yml:
- Line 72: The repeated hardcoded image string
(coasys/ad4m-ci-linux:latest@sha256:27f8c304877b03c74551175fc7727cfdd32b13fa6bb229d3fa18cee47889df4c)
should be centralized: stop relying on the YAML anchor &ad4m_image inside shell
strings and instead define a single environment variable (e.g., AD4M_IMAGE) at
the job level or export it in an earlier step, then update each docker run
invocation to reference that variable (e.g., $AD4M_IMAGE) so all four docker run
uses (the lines currently repeating the full image ref) use the single source of
truth.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1c391a and 49c5a2d.

📒 Files selected for processing (1)
  • .circleci/config.yml

…revious --user root runs

Self-hosted runners reuse workdirs. Previous CI runs used --user root which
left node_modules and target/ owned by root. Running as host UID now, pnpm
gets EACCES trying to open/modify those root-owned files.

Fix: add pre-build step that spins up a root container to rm -rf the stale
directories. One-liner, runs in seconds, ensures the build starts clean.
Running Docker as host UID (1000/marvin) with HOME=/home/circleci (owned by
circleci UID 1001, mode 750). Electron postinstall tries to mkdir
/home/circleci/.cache which host UID can't write to.

Fix:
- ELECTRON_SKIP_BINARY_DOWNLOAD=1: electron binary not needed for CI builds
- npm_config_cache=/workspace/.npm-cache: keep npm cache in bind-mounted
  workspace (writable by host UID) rather than $HOME/.cache
…i-home

Running as host UID (1000/marvin) with HOME=/home/circleci (owned by UID 1001)
causes EACCES for any tool that writes to $HOME:
- Deno: $HOME/.cache/deno (module cache)
- electron: $HOME/.cache/electron (binary download)
- node-gyp: $HOME/.cache/node-gyp (headers)

Fix: set HOME=/workspace/.ci-home (bind-mounted, writable by host UID).
CARGO_HOME and RUSTUP_HOME still point to /home/circleci/{.cargo,.rustup}
(read-only pre-installed toolchain — no writes needed).
DENO_DIR explicitly set so Deno doesn't fall back to $HOME/.cache/deno.
The Docker cleanup step was launching a second container (--user root)
to rm -rf node_modules/target. This raced with the main build container
for kernel veth interfaces, causing:
  'failed to add interface veth... resource temporarily unavailable'

Since all workdir files are now marvin-owned (not root-owned), we can
rm directly on the host — no Docker container needed for cleanup.
@lucksus lucksus closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants