Skip to content

chore(ci): optimize workflow performance with caching and path filters#773

Merged
DBosley merged 12 commits intomainfrom
chore/ci-optimization
Dec 19, 2025
Merged

chore(ci): optimize workflow performance with caching and path filters#773
DBosley merged 12 commits intomainfrom
chore/ci-optimization

Conversation

@DBosley
Copy link
Copy Markdown
Contributor

@DBosley DBosley commented Dec 16, 2025

Summary

Comprehensive CI optimization to reduce build times and eliminate redundant work across all workflows.

Key Optimizations

Optimization Impact
Runner Migration Eliminates 10-20min queue times (Solana jobs → ubuntu-latest)
Anchor Base Image ~6 min savings per CLI build (pre-compiled Anchor 0.29.0)
CLI Build Deduplication Build once, test in parallel (was building twice)
Docker Layer Caching ~10-15 min savings on cached runs
Tilt Images Pre-built images with registry-based caching
Parallel Solana Builds v1.0.0 and v2.0.0 build in parallel with GHA caching
Path Filtering Skip irrelevant CI entirely
Job Parallelization Lint jobs run parallel with builds
Concurrency Controls Cancel in-progress runs on new commits

Results

Workflow Before After
CLI (cached) ~20-30 min ~5 min (build once, test parallel)
CLI test-evm ~6 min ~1-2 min (cached layers)
CLI test-solana ~12 min (serial builds) ~5s cached / ~6 min parallel
Solana SBF 10-20 min queue + 15 min ~12 min (no queue)
Solana Lint Blocked by SBF ~40s (parallel on ubuntu)
EVM ~6 min ~4 min (parallel lint)
Unrelated PRs All CI runs Skipped via path filters

Architecture

CLI Workflow:
  anchor-base (reusable workflow)
    │ builds/caches anchor 0.29.0 image
    v
  build (ubuntu-latest)
    │ builds cli-local, pushes to ghcr.io
    ├── test-evm (ubuntu-latest, parallel, cached)
    │     pulls cli-local (cached), adds test layer, runs
    ├── build-solana-v1 (ubuntu-latest, parallel, GHA cached)
    │     restores cache OR builds v1.0.0 with fixed program ID
    ├── build-solana-v2 (ubuntu-latest, parallel, GHA cached)
    │     restores cache OR builds v2.0.0 with fixed program ID
    └── test-solana (ubuntu-latest, needs build-solana-v1/v2)
          downloads artifacts, uses --binary flag to skip rebuilding

Tilt Images Workflow:
  build-solana (ubuntu-latest) ─────────────────┐
    │ target: builder (full image, ~2GB)        │
    │ cache: registry :buildcache tag           ├── parallel
  build-evm (ubuntu-latest) ────────────────────┘
    │ target: foundry-export (FROM scratch, ~20MB)
    │ cache: registry :buildcache tag

Solana Workflow:
  lint (ubuntu-latest) ──────────────────┐
  solana-sbf (ubuntu-latest) ────────────┼── all parallel
  anchor-test (ubuntu-latest) ───────────┤
  check-version (ubuntu-latest) ─────────┘

EVM Workflow:
  lint (ubuntu-latest) ──────────────────┐
  test (ubuntu-latest) ──────────────────┼── all parallel
  echidna (ubuntu-latest) ───────────────┘

Files Changed

New Files:

  • Dockerfile.anchor-base - Pre-compiles Anchor 0.29.0
  • Dockerfile.cli-test-evm - Thin test layer for EVM
  • Dockerfile.cli-test-solana - Thin test layer for Solana
  • .github/workflows/anchor-base.yml - Reusable workflow for anchor-base image
  • .github/workflows/tilt-images.yml - Pre-builds Solana/EVM contract images
  • cli/test/build-solana-v1.sh - Builds v1.0.0 with fixed test program ID
  • cli/test/build-solana-v2.sh - Builds v2.0.0 with fixed test program ID

Modified:

  • Dockerfile.cli - Uses pre-built anchor binaries, copies avm wrapper
  • .github/workflows/cli.yml - Registry-based build sharing, anchor-base integration, parallel Solana builds with GHA caching
  • .github/workflows/tilt.yml - Path filters, ghcr.io login for cache
  • .github/workflows/evm.yml - Path filters, parallel lint, concurrency
  • .github/workflows/solana.yml - ubuntu-latest, parallel tests, concurrency, SBF cache validation
  • .github/workflows/sdk.yml - Path filters, concurrency
  • .github/workflows/prettier.yml - Path filters, concurrency
  • Tiltfile - Target builder stage, cache_from for registry caching
  • cli/test/solana.sh - Uses pre-built artifacts via --binary flag when available
  • sdk/__tests__/utils.ts - Fixed infinite loop in waitForRelay(), serialized peer registration

Bug Fixes

  1. SDK flaky tests (waitForRelay): Fixed infinite loop in waitForRelay() that caused tests to hang. Added maxRetries parameter (default 60 attempts / 2 minutes).

  2. SDK nonce collisions (Tilt CI): Fixed NONCE_EXPIRED errors in Tilt CI tests. The link() function was registering peers in parallel using Promise.all. When EVM chains share the same signer, concurrent transactions cause nonce collisions. Changed to sequential for...of loop.

  3. Anchor version mismatch: Fixed avm wrapper not being copied from anchor-base. The backpackapp base image ships anchor 0.30.1, and we need to overwrite it with the avm wrapper that delegates to 0.29.0.

  4. Program ID mismatch: Pre-built Solana binaries have program IDs baked in at compile time. Build scripts now patch Anchor.toml and lib.rs with a fixed test program ID before building, and solana.sh uses the corresponding keypair from artifacts.

  5. Tilt image target: Changed Solana Tilt image from export (scratch) back to builder target. The Dockerfile.test-validator uses COPY --from=ntt-solana-contract and requires the full builder filesystem, not a minimal scratch image.

  6. SBF cache validation: Fixed empty cache causing test failures. actions/cache@v4 was caching an empty directory (200 bytes) because it saves/restores in the same step. Split into actions/cache/restore@v4 + verification step + actions/cache/save@v4 to ensure we only skip builds when cache contains actual .so files.

@DBosley DBosley force-pushed the chore/ci-optimization branch 7 times, most recently from 33197ab to 35328b2 Compare December 17, 2025 20:53
dvgui
dvgui previously approved these changes Dec 19, 2025
Copy link
Copy Markdown
Contributor

@dvgui dvgui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One test (tilt) seems to still be failing.

Comment thread cli/test/build-solana-v2.sh Outdated
DBosley and others added 12 commits December 19, 2025 12:27
- Add path filters to all workflows to skip unnecessary CI runs
- Add concurrency groups to cancel in-progress runs on new commits
- Parallelize EVM and Solana test jobs where possible
- Move Solana jobs from tilt-kube-public to ubuntu-latest
- Add Dockerfile.anchor-base with Solana toolchain and Anchor
- Add anchor-base.yml reusable workflow to build/cache the image
- Image is pushed to ghcr.io and reused across CLI workflow jobs
- Split Solana v1/v2 contract builds into parallel jobs
- Cache pre-built .so artifacts between runs using GHA cache
- Use fixed test program keypair for reproducible builds
- Add Dockerfile.cli-test-evm and Dockerfile.cli-test-solana
- Run test-solana directly in container with proper environment
- Cache key includes all build dependencies (scripts, Dockerfiles, Anchor.toml)
- Add tilt-images.yml to pre-build Solana/EVM contract images
- Make Tilt CI depend on tilt-images for warm Docker cache
- Add Docker layer caching for Tilt CI builds
- Remove dead devnet/** path filter (directory doesn't exist)
- Add maxRetries parameter (default 60 = 2 minutes)
- Throw error after max retries instead of hanging forever
- Improve logging with attempt counter
…uild

- Upload SBF program artifacts from solana-sbf job
- Download pre-built .so files in anchor-test instead of rebuilding
- Remove 9+ minute anchor build that duplicated solana-sbf work
- Remove unnecessary Cargo toolchain/cache steps from anchor-test
- Build only TypeScript SDK (fast) instead of full make sdk
- Add SBF artifact cache keyed on Cargo.lock + source files hash
- Skip build step entirely when cache hits (saves ~3.5 min)
- Build only runs when Solana source code actually changes
- Tests still run every time to verify correctness
Reverts the misguided attempt to share artifacts between solana-sbf
and anchor-test jobs. These jobs are designed to run in parallel:

- solana-sbf: uses cargo build-sbf, runs cargo test-sbf
- anchor-test: uses anchor build (via make sdk), runs anchor test

Each job needs to build its own artifacts because:
- anchor-test needs IDL files that only anchor build generates
- The jobs test different things and should run in parallel

Keeps the SBF artifact caching optimization for solana-sbf job
which skips the build when Solana source code hasn't changed.
The 'export' stage (FROM scratch) doesn't work for Tilt because
Dockerfile.test-validator does COPY --from=ntt-solana-contract and
requires the full builder filesystem. Reverting to 'builder' target
while keeping cache_from for layer caching benefits.
… build

The actions/cache@v4 was caching an empty directory (200 bytes) because
it saves/restores in the same step. When the cache "hit", it restored
nothing useful, causing tests to fail with "Program file data not available".

Fix:
- Split into actions/cache/restore@v4 (restore-only)
- Add verification step to check for actual .so files
- Use verification output for build condition instead of cache-hit
- Add separate actions/cache/save@v4 step after successful build
…parameterized script

- Merge build-solana-v1.sh and build-solana-v2.sh into build-solana.sh
- Script now takes version as argument (e.g., build-solana.sh 1.0.0)
- Update CLI workflow to use matrix strategy for parallel builds
- Fix artifact naming to match upload/download (solana-v1.0.0-artifacts)
@DBosley DBosley force-pushed the chore/ci-optimization branch from 321eeb5 to 092321a Compare December 19, 2025 19:28
@DBosley DBosley merged commit f338ffa into main Dec 19, 2025
21 checks passed
@DBosley DBosley deleted the chore/ci-optimization branch December 19, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants