Skip to content

chore: bench phase breakdown + thread sweep for MSM reduction#21885

Merged
johnathan79717 merged 3 commits intoAztecProtocol:merge-train/barretenbergfrom
peter941221:codex/1656-batch-msm-bench
Mar 30, 2026
Merged

chore: bench phase breakdown + thread sweep for MSM reduction#21885
johnathan79717 merged 3 commits intoAztecProtocol:merge-train/barretenbergfrom
peter941221:codex/1656-batch-msm-bench

Conversation

@peter941221
Copy link
Copy Markdown
Contributor

@peter941221 peter941221 commented Mar 23, 2026

Context

This PR addresses AztecProtocol/barretenberg#1656 by making the MSM::batch_multi_scalar_mul(...) phase breakdown measurable and by adding a benchmark case that targets the exact regime called out in the issue (2^16 points with 256 threads).

The goal is to answer: is the single-threaded final reduction (accumulate_results) actually a bottleneck relative to the full MSM?

What Changed

1) Phase Breakdown (BB_BENCH) inside batch_multi_scalar_mul

Added BB_BENCH scopes for:

  • MSM::batch_multi_scalar_mul/evaluate_work_units
  • MSM::batch_multi_scalar_mul/accumulate_results
  • MSM::batch_multi_scalar_mul/batch_normalize
  • MSM::batch_multi_scalar_mul/scalars_to_montgomery

These are surfaced in google-benchmark output via GOOGLE_BB_BENCH_REPORTER(state).

2) New Benchmark Case: BatchMSM_1656

Added a dedicated benchmark:

  • PippengerBench/BatchMSM_1656/256/{msm_size}
  • Single MSM (num_polys = 1)
  • Sizes:
    • msm_size ∈ {2^16, 2^20}

This uses bb::set_parallel_for_concurrency(256) to force the intended partitioning and restores the original value after the benchmark finishes.

Results (Local)

Machine: 4 vCPU laptop (so 256 threads is intentionally oversubscribed; the key point is the absolute reduction overhead).

Key takeaways (from BB_BENCH counters; time counters are nanoseconds):

  • 2^16, 256 threads:
    • accumulate_results ≈ 139k ns (~0.139 ms)
    • total ≈ 521 ms
    • reduction fraction ~0.027%
  • 2^20, 256 threads:
    • accumulate_results ≈ 141k ns (~0.141 ms)
    • total ≈ 3457 ms
    • reduction fraction ~0.004%

So the final reduction appears negligible in these regimes on my setup, and the benchmark now makes it easy to validate on a 64+ core machine where the original concern is most relevant.

Notes / Background

  • MSM here is a Pippenger-style MSM. For background reading on multiexponentiation / multi-product methods:
    • Nicholas Pippenger, On the evaluation of powers and related problems (1976).
    • Jurjen Bos and Matthijs Coster, Addition Chain Heuristics (CRYPTO 1989).
    • Ryan Henry, Pippenger's Multiproduct and Multiexponentiation Algorithms (2010).

How To Run

cmake --build <build-dir> --target pippenger_bench
CRS_PATH=<path-to-bn254_g1.dat> <build-dir>/bin/pippenger_bench \
  --benchmark_filter='BatchMSM_1656' \
  --benchmark_min_time=0.1s \
  --benchmark_counters_tabular=true

Fixes AztecProtocol/barretenberg#1656

@peter941221
Copy link
Copy Markdown
Contributor Author

CI3 (External) is currently blocked because this is an external PR from a fork: the workflow says external PRs need the ci-external or ci-external-once label to run.

Would someone with label permission mind adding one of those labels so CI can execute? (This PR only touches the MSM benchmark + BB_BENCH instrumentation; no CI/workflow files.)

@peter941221
Copy link
Copy Markdown
Contributor Author

Follow-up: I narrowed the BatchMSM_1656 registration to target only the original issue scenario (threads=256, msm_size ∈ {2^16, 2^20}), so the benchmark name is now:

  • PippengerBench/BatchMSM_1656/256/{msm_size}

Latest local run (4 vCPU laptop; BB_BENCH time counters are ns):

  • 2^16: accumulate_results ≈ 139k ns (~0.139 ms) vs total ≈ 521 ms
  • 2^20: accumulate_results ≈ 141k ns (~0.141 ms) vs total ≈ 3457 ms

@johnathan79717 johnathan79717 added ci-external-once Run CI on an external PR, but only once. and removed ci-external-once Run CI on an external PR, but only once. labels Mar 24, 2026
@johnathan79717 johnathan79717 changed the title bench: phase breakdown + thread sweep for MSM reduction (barretenberg#1656) chore: bench phase breakdown + thread sweep for MSM reduction Mar 24, 2026
@johnathan79717 johnathan79717 added ci-external Allow CI to run on this external pull request. and removed ci-external-once Run CI on an external PR, but only once. labels Mar 24, 2026
github-merge-queue Bot pushed a commit that referenced this pull request Mar 24, 2026
## Summary
- The `ci-external` workflow fails with `Resource not accessible by
integration (removeLabelsFromLabelable)` because `github.token` defaults
to read-only when no `permissions` block is set.
- Adds `contents: read` + `pull-requests: write` to the `ci-external`
job so `gh pr edit --remove-label "ci-external-once"` can succeed.

## Context
- Safe because the workflow uses `pull_request_target` (code always from
base branch, not fork) and is gated by a maintainer adding the
`ci-external` / `ci-external-once` label.
- Matches the pattern used by other label-modifying workflows in the
repo (e.g., `merge-train-create-pr`, `auto-close-stale-drafts`).
- Fixes the CI failure seen on external PRs like #21885.
@AztecBot
Copy link
Copy Markdown
Collaborator

This issue was automatically closed because it was referenced in PR #21965 which has been merged to the default branch.

View workflow run

@AztecBot AztecBot closed this Mar 24, 2026
@johnathan79717 johnathan79717 added ci-external-once Run CI on an external PR, but only once. and removed ci-external Allow CI to run on this external pull request. labels Mar 25, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@johnathan79717 johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@johnathan79717 johnathan79717 changed the base branch from next to merge-train/barretenberg March 25, 2026 13:19
@johnathan79717 johnathan79717 requested review from a team, charlielye and nventuro as code owners March 25, 2026 13:19
@johnathan79717 johnathan79717 removed request for a team, charlielye and nventuro March 25, 2026 13:20
@johnathan79717 johnathan79717 self-assigned this Mar 25, 2026
@johnathan79717 johnathan79717 force-pushed the codex/1656-batch-msm-bench branch from 69ef274 to 07ca389 Compare March 25, 2026 13:23
@johnathan79717 johnathan79717 enabled auto-merge (squash) March 25, 2026 13:23
@johnathan79717 johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@johnathan79717 johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@johnathan79717 johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026
@AztecBot AztecBot deleted the branch AztecProtocol:merge-train/barretenberg March 26, 2026 15:45
@AztecBot AztecBot closed this Mar 26, 2026
auto-merge was automatically disabled March 26, 2026 15:45

Pull request was closed

peter and others added 3 commits March 27, 2026 11:14
Replaces speculative TODO with actual measurements from a 192-core machine:
~512us for 2^16 MSM (1.2% of total), ~207us for 2^20 (<0.1%).
@johnathan79717 johnathan79717 force-pushed the codex/1656-batch-msm-bench branch from 6ebf784 to 88f04ea Compare March 27, 2026 11:14
@johnathan79717 johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 27, 2026
@github-actions github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 27, 2026
@johnathan79717 johnathan79717 merged commit c251a92 into AztecProtocol:merge-train/barretenberg Mar 30, 2026
21 of 22 checks passed
github-merge-queue Bot pushed a commit that referenced this pull request Mar 31, 2026
BEGIN_COMMIT_OVERRIDE
chore: bench phase breakdown + thread sweep for MSM reduction (#21885)
chore: minor fixes pt. 2 (#22138)
chore: minor fixes pt. 3 (#22181)
fix: satisfy Chonk num_circuits >= 4 assertion in mock IVC creation
(#22188)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate: is single-threaded bucket accumulation in Pippenger a bottleneck?

4 participants