Make use of self hosted runners by mcbarton · Pull Request #848 · compiler-research/CppInterOp

mcbarton · 2026-03-10T17:38:38Z

closes #847

vgvassilev · 2026-03-10T17:55:48Z

Note that the self hosted bots are not ephemeral and anything installed stays until next regeneration

mcbarton · 2026-03-10T17:58:28Z

Note that the self hosted bots are not ephemeral and anything installed stays until next regeneration

Is it possible for someone to ssh onto the runner and manually install cuda 12 and 13 using the commands in this PR? I can then change the PR to use the correct cuda depending on the cuda matrix option.

mcbarton · 2026-03-10T18:05:07Z

Note that the self hosted bots are not ephemeral and anything installed stays until next regeneration

Alternatively (probably will take some thinking about how to implement correctly), what about using docker on the self hosted runner to provide a clean environment each time someone runs a ci job. We can could then run the ci inside the container and delete the image at the end of the job to stop us from filling up the runner disk space. That way no ones PR risks influencing anyone elses who runs on the self hosted runner.

vgvassilev · 2026-03-10T18:44:22Z

Note that the self hosted bots are not ephemeral and anything installed stays until next regeneration

Alternatively (probably will take some thinking about how to implement correctly), what about using docker on the self hosted runner to provide a clean environment each time someone runs a ci job. We can could then run the ci inside the container and delete the image at the end of the job to stop us from filling up the runner disk space. That way no ones PR risks influencing anyone elses who runs on the self hosted runner.

It is stateful on purpose because it takes much longer to install cuda, etc.

vgvassilev · 2026-03-10T18:46:32Z

Note that these systems are cuda ready -- take a look at how they are used in clad.

codecov · 2026-03-10T18:51:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.56%. Comparing base (852f6d4) to head (57b5d13).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #848   +/-   ##
=======================================
  Coverage   79.56%   79.56%           
=======================================
  Files          11       11           
  Lines        4013     4013           
=======================================
  Hits         3193     3193           
  Misses        820      820

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vgvassilev · 2026-03-15T06:55:33Z

Any news on that?

mcbarton · 2026-03-15T07:18:43Z

Any news on that?

I will finish this PR off either later today or tomorrow morning

mcbarton · 2026-03-15T18:34:22Z

.github/workflows/main.yml

+  prepare-dell:
+    name: Activate self-host infrastructure
+    runs-on: self-hosted
+    steps:
+      - name: Send Magic Packet
+        env:
+          TARGET_IP: 192.168.100.30
+          MAC_ADDR: a4:bb:6d:51:d5:d2
+          # The container has no ping, emulate it.
+        run: |
+          # Mask the IP and potential broadcast to keep logs clean
+          echo "::add-mask::$MAC_ADDR"
+          echo "::add-mask::$BROADCAST"
+          echo "::add-mask::$TARGET_IP"
+          BROADCAST=$(echo $TARGET_IP | sed 's/\.[0-9]*$/ .255/' | tr -d ' ')
+          PING="timeout 1 bash -c 'cat < /dev/null > /dev/tcp/$TARGET_IP/22' 2>/dev/null"
+
+          # Install tool silently
+          sudo apt-get update -qq && sudo apt-get install -y -qq wakeonlan > /dev/null
+
+          # Check if already awake (using the Bash TCP PING variable)
+          if eval "$PING"; then
+            echo "Target machine is already awake. Exiting."
+            exit 0
+          fi
+
+          # If offline, send WoL
+          echo "Machine is offline. Sending WoL..."
+          wakeonlan -i $BROADCAST $MAC_ADDR > /dev/null
+
+          # Wait & Verify Loop (checks every 10s for 4 minutes)
+          echo "Waiting for response (checking Port 22)..."
+          for i in {1..24}; do
+            if eval "$PING"; then
+              echo "Machine is online and SSH is ready."
+              exit 0
+            fi
+            sleep 10
+          done
+
+          echo "Error: Target hardware did not respond within the timeout period."
+          exit 1


Not sure what this is actually doing, and how it activates the self hosted infrastructure. Took it from Clads workflows.

The self hosted runner will now try to run this section, but it fails. Since I don't know exactly what it does (I'm guessing the the comments it allows you to ssh into the runner for debug builds if needed), I don't know how to fix.

mcbarton · 2026-03-15T18:38:30Z

.github/workflows/main.yml

  cancel-in-progress: true

 jobs:
+  prepare-dell:


This is not running at the moment. Just displaying the below message. I was able to use the self hosted runners the other day, so hopefully this is just some Github issue, and will run soon.

vgvassilev

LGTM!

mcbarton · 2026-03-21T16:00:57Z

@vgvassilev @aaronj0 this got merged in, but this PR was broken. It repeated what clad had but the first stage (prepare-dell) doesn't pass the ci. The ci on main will not pass now this been merged in. I am away from a computer at the moment, so one of you will need to make the reversion PR.

mcbarton force-pushed the cuda-runners branch 3 times, most recently from 69d8790 to 3a1b4a5 Compare March 15, 2026 18:27

mcbarton changed the title ~~Make use of self hosted runners to have cuda 12.6 and cuda 13.2 jobs~~ Make use of self hosted runners Mar 15, 2026

mcbarton commented Mar 15, 2026

View reviewed changes

Use self hosted runners

92cd65a

mcbarton force-pushed the cuda-runners branch from 3a1b4a5 to 92cd65a Compare March 16, 2026 11:43

vgvassilev approved these changes Mar 21, 2026

View reviewed changes

vgvassilev merged commit 4c6e2c2 into compiler-research:main Mar 21, 2026
11 of 15 checks passed

Conversation

mcbarton commented Mar 10, 2026

Uh oh!

vgvassilev commented Mar 10, 2026

Uh oh!

mcbarton commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcbarton commented Mar 10, 2026

Uh oh!

vgvassilev commented Mar 10, 2026

Uh oh!

vgvassilev commented Mar 10, 2026

Uh oh!

codecov bot commented Mar 10, 2026

Codecov Report

Uh oh!

vgvassilev commented Mar 15, 2026

Uh oh!

mcbarton commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcbarton Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcbarton Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

mcbarton Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

vgvassilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcbarton commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcbarton commented Mar 10, 2026 •

edited

Loading

mcbarton commented Mar 15, 2026 •

edited

Loading

mcbarton Mar 15, 2026 •

edited

Loading

mcbarton commented Mar 21, 2026 •

edited

Loading