fix(tests): kill executor by port, not by name — safe for concurrent CI runners#698
fix(tests): kill executor by port, not by name — safe for concurrent CI runners#698
Conversation
…port The Linux release build failed because the CI Docker image was missing required dependencies for the newer Tauri/WebKit stack: - libsoup-3.0-dev - libjavascriptcoregtk-4.1-dev - libwebkit2gtk-4.1-dev - libayatana-appindicator3-dev Ubuntu 22.04 only has webkit2gtk-4.0, but the current build requires 4.1. Updated to Ubuntu 24.04 base image which includes these packages. Also: - Fixed Go version (1.24.6 doesn't exist, using 1.22.6) - Updated fuse to fuse3 for Ubuntu 24.04 compatibility
fix(ci): Update Linux CI image to Ubuntu 24.04 for webkit2gtk-4.1 support
- Changed base from nvidia/cuda Ubuntu 24.04 to 22.04 - Matches non-CUDA image for consistent glibc version in release binaries - All tooling versions kept in sync (Rust 1.92, Go 1.24.6, Node 18, etc.)
- Install CUDA 12.5 toolkit in existing Dockerfile (no separate image needed) - Add build-launcher-binary-linux-cuda job to publish_staging.yml - Uses same coasys/ad4m-ci-linux:latest image (now with CUDA) - Runs pnpm package-ad4m:cuda for CUDA-enabled binaries - Uploads CUDA AppImage and CLI executor as release assets - Remove separate Dockerfile.cuda
…ge name - Remove CUDA from existing Dockerfile (keep it clean, non-CUDA) - Add Dockerfile.cuda: extends coasys/ad4m-ci-linux:latest (Ubuntu 24.04) with CUDA 12.5 toolkit — avoids full rebuild, just layers CUDA on top - Fix publish_staging.yml: use coasys/ad4m-ci-linux-cuda:latest (not the non-CUDA image) for the CUDA build job
…4.04 base) Nico wants CUDA in the main image, not a separate Dockerfile.cuda. The existing base image is Ubuntu 24.04 (cimg/base:edge-24.04), so ubuntu2404 is the correct CUDA repo variant. - Adds CUDA 12.5 toolkit at end of main Dockerfile - Removes Dockerfile.cuda (no longer needed) - Reverts workflow CUDA job to use coasys/ad4m-ci-linux:latest (the main image, which will include CUDA after rebuild)
…bility - FROM cimg/base:edge-22.04 (22.04 packages: webkit2gtk-4.0, fuse, etc.) - Go 1.24.6 (from dev branch, not the older 1.22.6) - CUDA 12.5 at end with ubuntu2204 repo URL (correct for 22.04 base) - Fixed all ENV syntax to key=value format
New image: coasys/ad4m-ci-linux:latest@sha256:bffabe1dfc4ef18ca64105318d342ab61c2db946ae6ba16718dcd8d1d57c2cff Also tagged: coasys/ad4m-ci-linux:cuda-20260225 Changes: Ubuntu 22.04 base (was 24.04), CUDA 12.5 toolkit added, Go 1.24.6 Updated in: agent-language-tests, p-diff-sync-tests, file-storage-language-tests, direct-message-language-tests, publish_staging (all jobs incl. CUDA), publish, .circleci/config.yml
ci: CUDA Docker image + staging build workflow
…st; Windows build re-enabled - Dockerfile: add libwebkit2gtk-4.1-dev, libjavascriptcoregtk-4.1-dev, squashfs-tools (Tauri v2 requires webkit2gtk-4.1; squashfs-tools needed for AppImage mksquashfs) - publish_staging.yml: update all 4 jobs to new Docker image digest (0eddf04e4380) - macOS: bump rust-cache shared-key to 'staging-v2' + cargo update step for fresh V8 resolution - Windows: re-enable build job (was disabled due to V8 source build failures) Now uses pre-built coasys/rusty_v8@v137.1.0 binaries — no source compilation needed - Also: coasys/deno new-v8-dylib-hickory-update Cargo.lock updated separately (commit 4e5a748)
fix(ci): webkit2gtk-4.1 in Docker + macOS V8 cache bust + Windows build re-enabled
…CI runners
On a shared host machine with multiple concurrent CircleCI runner instances,
killing ad4m-executor by process name (pkill / kill-process-by-name) would
terminate executors belonging to OTHER concurrent CI jobs.
Each test file already uses a unique port range, so port-based kills are safe.
Changes:
- scripts/cleanup.js: replace killProcess('ad4m') with lsof-based port kills
for all ports used in the test-all sequence (15000-16602)
- email-verification-test-with-setup.sh: remove pkill by name, kill by port only
(15700-15702 for setup, 15920-15922 for email-verification test)
- test-multi-user-with-setup.sh: remove pkill by name, kill by port only
(15700-15702 for setup, 15900-15902 for multi-user-simple test)
Port map (for reference):
app.test.ts 15000-15002
authentication.test.ts 15100-15102, 15200-15203
integration.test.ts 15300-15302
simple.test.ts 15600-15602
publishTestLangs.ts 15700-15702
multi-user-connect.test 15800-15802
multi-user-simple.test 15900-15902
email-verification.test 15920-15922
prolog-and-literals.test 16600-16602
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughThe PR updates CI infrastructure: adds CUDA 12.5 and many system deps to the CI Dockerfile, pins/updates CI image digests across CircleCI and GitHub Actions, adds new CUDA-enabled Linux and expanded Windows publish jobs, and replaces name-based test cleanup with port-based cleanup and environment-configurable ports. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as CI Runner (container image)
participant Build as Build toolchain (cargo/go/clang)
participant Release as GitHub Releases
GH->>Runner: start job (uses pinned CUDA-enabled image)
Runner->>Build: checkout code, install deps, setup CUDA env
Build->>Build: build Linux CUDA binary & AppImage, build Windows artifacts (on respective jobs)
Build->>Release: upload artifacts (AppImage, CUDA binary, Windows MSI/NSIS)
Release-->>GH: artifacts published (release assets)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
publishTestLangs.ts now reads ports from AD4M_SETUP_GQL_PORT / AD4M_SETUP_HC_ADMIN_PORT / AD4M_SETUP_HC_APP_PORT env vars (defaults 15700/15701/15702 for integration-tests-js / test-main). Port assignments per CI job: integration-tests-js (test-main): 15700/15701/15702 (default) integration-tests-multi-user-simple: 15703/15704/15705 integration-tests-email-verification: 15706/15707/15708 This prevents concurrent jobs on the same self-hosted runner from binding to the same port during the setup (publish-test-languages) phase and killing each other's executors. Also remove setup ports 15700-15702 from cleanup.js — those are prepare- phase ports, not test-phase ports; including them caused cleanup.js (called between test-all runs) to kill OTHER jobs' setup executors.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.github/workflows/publish_staging.yml (2)
391-434: Consider updating GitHub Actions to their latest major versions.The static analysis flagged several actions as outdated. While they may still work, newer versions often include security fixes and performance improvements:
actions/checkout@v3→@v4actions/setup-go@v4→@v5actions/setup-python@v4→@v5actions/setup-node@v3→@v4♻️ Proposed fixes for Windows job
steps: - name: Fetch source code - uses: actions/checkout@v3 + uses: actions/checkout@v4 # Rust toolchain configured via rust-toolchain.toml - name: Install Rust toolchain- name: Install GO - uses: actions/setup-go@v4 + uses: actions/setup-go@v5 with: go-version: '1.24.6' cache: true - name: Install Deno uses: denoland/setup-deno@v1 with: deno-version: v2.x - name: Use Python 3.11 - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: python-version: '3.11' - uses: pnpm/action-setup@v4 - name: Use Node.js 18.x - uses: actions/setup-node@v3 + uses: actions/setup-node@v4 with: node-version: 18.x cache: 'pnpm'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/publish_staging.yml around lines 391 - 434, Update the outdated GitHub Actions uses in the workflow: replace actions/checkout@v3 with actions/checkout@v4, actions/setup-go@v4 with actions/setup-go@v5, actions/setup-python@v4 with actions/setup-python@v5, and actions/setup-node@v3 with actions/setup-node@v4; edit the corresponding steps named "Fetch source code", "Install GO", "Use Python 3.11", and "Use Node.js 18.x" to point to the new major versions and verify their inputs (like node-version, go-version, python-version, and cache) remain compatible after the bump.
451-454: Cargo update with error suppression.The
|| truepattern silently ignores failures. If the V8 pre-built binary alignment is critical for Windows builds, consider logging when the update fails rather than silently continuing.♻️ Optional: Log on failure
- name: Update cargo git deps (ensure pre-built V8 from coasys/rusty_v8 tag) working-directory: rust-executor shell: bash - run: cargo update -p v8 2>/dev/null || cargo update 2>/dev/null || true + run: cargo update -p v8 2>/dev/null || cargo update 2>/dev/null || echo "Warning: cargo update skipped"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/publish_staging.yml around lines 451 - 454, The CI step named "Update cargo git deps (ensure pre-built V8 from coasys/rusty_v8 tag)" currently suppresses errors with "|| true"; change the step so failures are not silently ignored—capture the exit status of `cargo update -p v8` (and fallback `cargo update`) and when either command fails, print a clear diagnostic (e.g., echo a message including stderr) and either fail the job or emit a warning so the log shows the failure; update the shell/run command in that step to detect non-zero exit codes and log the error instead of swallowing it with "|| true"..circleci/Dockerfile (2)
7-28: Consider adding--no-install-recommendsto reduce image size.The apt-get install commands pull in recommended packages by default, which can significantly increase image size. Adding
--no-install-recommendsis a common Docker best practice for CI images.♻️ Proposed fix
-RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository universe && apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends software-properties-common && add-apt-repository universe && apt-get update && apt-get install -y --no-install-recommends \ libgtk-3-dev \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.circleci/Dockerfile around lines 7 - 28, Update the long RUN layer that performs apt-get install (the line starting with "RUN apt-get update && apt-get install -y ...") to add --no-install-recommends to the apt-get install invocations so apt doesn't pull recommended packages and the CI Docker image size is reduced; ensure both apt-get install commands in that RUN use --no-install-recommends and keep the rest of the package list and chaining unchanged.
89-99: CUDA 12.5 installation looks correct.The CUDA toolkit installation follows the standard NVIDIA installation procedure for Ubuntu 22.04. PATH and LD_LIBRARY_PATH are correctly configured.
Consider applying
--no-install-recommendshere as well for consistency:♻️ Proposed fix for consistency
-RUN apt-get install -y gnupg2 wget && \ +RUN apt-get install -y --no-install-recommends gnupg2 wget && \ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \ dpkg -i cuda-keyring_1.1-1_all.deb && \ rm cuda-keyring_1.1-1_all.deb && \ apt-get update && \ - apt-get install -y cuda-toolkit-12-5 && \ + apt-get install -y --no-install-recommends cuda-toolkit-12-5 && \ apt-get clean && rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.circleci/Dockerfile around lines 89 - 99, The RUN command that installs CUDA toolkit should use --no-install-recommends for consistency and smaller image size; update the apt-get install invocations inside the RUN block that downloads and installs cuda-keyring and cuda-toolkit-12-5 to include --no-install-recommends (i.e., change "apt-get install -y gnupg2 wget" and "apt-get install -y cuda-toolkit-12-5" to use "--no-install-recommends -y"), leaving the ENV PATH and ENV LD_LIBRARY_PATH lines unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/publish_staging.yml:
- Line 49: The workflow uses a different image digest for the Docker image
(coasys/ad4m-ci-linux:latest@sha256:0eddf04e4380f892968297a248da05c734f64d089e9e01cf7b89e3ff4328db4c)
than all other workflows and CircleCI (which use
sha256:bffabe1dfc4ef18ca64105318d342ab61c2db946ae6ba16718dcd8d1d57c2cff); either
update this digest to the canonical sha256:bffabe1d... value to match the other
workflows or add a concise inline comment next to the image reference explaining
why this different digest is intentional (e.g., specific staging/CUDA
requirements) so the divergence is documented.
---
Nitpick comments:
In @.circleci/Dockerfile:
- Around line 7-28: Update the long RUN layer that performs apt-get install (the
line starting with "RUN apt-get update && apt-get install -y ...") to add
--no-install-recommends to the apt-get install invocations so apt doesn't pull
recommended packages and the CI Docker image size is reduced; ensure both
apt-get install commands in that RUN use --no-install-recommends and keep the
rest of the package list and chaining unchanged.
- Around line 89-99: The RUN command that installs CUDA toolkit should use
--no-install-recommends for consistency and smaller image size; update the
apt-get install invocations inside the RUN block that downloads and installs
cuda-keyring and cuda-toolkit-12-5 to include --no-install-recommends (i.e.,
change "apt-get install -y gnupg2 wget" and "apt-get install -y
cuda-toolkit-12-5" to use "--no-install-recommends -y"), leaving the ENV PATH
and ENV LD_LIBRARY_PATH lines unchanged.
In @.github/workflows/publish_staging.yml:
- Around line 391-434: Update the outdated GitHub Actions uses in the workflow:
replace actions/checkout@v3 with actions/checkout@v4, actions/setup-go@v4 with
actions/setup-go@v5, actions/setup-python@v4 with actions/setup-python@v5, and
actions/setup-node@v3 with actions/setup-node@v4; edit the corresponding steps
named "Fetch source code", "Install GO", "Use Python 3.11", and "Use Node.js
18.x" to point to the new major versions and verify their inputs (like
node-version, go-version, python-version, and cache) remain compatible after the
bump.
- Around line 451-454: The CI step named "Update cargo git deps (ensure
pre-built V8 from coasys/rusty_v8 tag)" currently suppresses errors with "||
true"; change the step so failures are not silently ignored—capture the exit
status of `cargo update -p v8` (and fallback `cargo update`) and when either
command fails, print a clear diagnostic (e.g., echo a message including stderr)
and either fail the job or emit a warning so the log shows the failure; update
the shell/run command in that step to detect non-zero exit codes and log the
error instead of swallowing it with "|| true".
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.circleci/Dockerfile.circleci/config.yml.github/workflows/agent-language-tests.yml.github/workflows/direct-message-language-tests.yml.github/workflows/file-storage-language-tests.yml.github/workflows/p-diff-sync-tests.yml.github/workflows/publish.yml.github/workflows/publish_staging.ymltests/js/email-verification-test-with-setup.shtests/js/scripts/cleanup.jstests/js/test-multi-user-with-setup.shtests/js/utils/publishTestLangs.ts
| runs-on: ubuntu-latest | ||
| container: | ||
| image: coasys/ad4m-ci-linux:latest@sha256:3d6e8b6357224d689345eebd5f9da49ee5fd494b3fd976273d0cf5528f6903ab | ||
| image: coasys/ad4m-ci-linux:latest@sha256:0eddf04e4380f892968297a248da05c734f64d089e9e01cf7b89e3ff4328db4c |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the image digests used across workflow files
echo "=== Digests in GitHub workflows ==="
rg -n 'coasys/ad4m-ci-linux:latest@sha256:' .github/workflows/ --no-heading
echo ""
echo "=== Digest in CircleCI config ==="
rg -n 'coasys/ad4m-ci-linux:latest@sha256:' .circleci/config.yml --no-headingRepository: coasys/ad4m
Length of output: 1668
Image digest differs from other workflow files and CircleCI config.
This file uses digest sha256:0eddf04e4380f892968297a248da05c734f64d089e9e01cf7b89e3ff4328db4c (appearing on lines 49, 153, 513, 620), while all other GitHub workflows (publish.yml, agent-language-tests.yml, direct-message-language-tests.yml, file-storage-language-tests.yml, p-diff-sync-tests.yml) and the CircleCI config use sha256:bffabe1dfc4ef18ca64105318d342ab61c2db946ae6ba16718dcd8d1d57c2cff.
If this intentional for staging-specific requirements (e.g., CUDA support), add a comment to document why. Otherwise, align the digests for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/publish_staging.yml at line 49, The workflow uses a
different image digest for the Docker image
(coasys/ad4m-ci-linux:latest@sha256:0eddf04e4380f892968297a248da05c734f64d089e9e01cf7b89e3ff4328db4c)
than all other workflows and CircleCI (which use
sha256:bffabe1dfc4ef18ca64105318d342ab61c2db946ae6ba16718dcd8d1d57c2cff); either
update this digest to the canonical sha256:bffabe1d... value to match the other
workflows or add a concise inline comment next to the image reference explaining
why this different digest is intentional (e.g., specific staging/CUDA
requirements) so the divergence is documented.
After dev was merged into this branch, non-staging workflow files still referenced the older bffabe1d digest. publish_staging.yml was already on 0eddf04e4380 (webkit2gtk-4.1 + squashfs-tools + CUDA 12.5). Update all remaining CI files to the same newest digest so CodeRabbit's inconsistency warning is resolved and all jobs use the same image.
Problem
With multiple concurrent CircleCI runner instances on the same host machine,
pkill -f "ad4m"andkillProcess("ad4m")would terminate executor processes belonging to other concurrent CI jobs, causing random test failures.Root cause
Three places killed by process name:
scripts/cleanup.js—killProcess("ad4m")called between every test file intest-allemail-verification-test-with-setup.sh—pkill -f "rust-executor"+pkill -f "ad4m"test-multi-user-with-setup.sh— sameFix
Each test file uses a unique port range, so killing by port is precise and safe:
app.test.tsauthentication.test.tsintegration.test.tssimple.test.tspublishTestLangs.ts(setup)multi-user-connect.test.tsmulti-user-simple.test.tsemail-verification.test.tsprolog-and-literals.test.tscleanup.js: replacedkillProcess("ad4m")withlsof -ti:PORT | xargs -r kill -9for each port in the sequencepkilllines, kept/expanded port-based kills onlySummary by CodeRabbit