chore: bench phase breakdown + thread sweep for MSM reduction by peter941221 · Pull Request #21885 · AztecProtocol/aztec-packages

peter941221 · 2026-03-23T08:28:57Z

Context

This PR addresses AztecProtocol/barretenberg#1656 by making the MSM::batch_multi_scalar_mul(...) phase breakdown measurable and by adding a benchmark case that targets the exact regime called out in the issue (2^16 points with 256 threads).

The goal is to answer: is the single-threaded final reduction (accumulate_results) actually a bottleneck relative to the full MSM?

What Changed

1) Phase Breakdown (`BB_BENCH`) inside `batch_multi_scalar_mul`

Added BB_BENCH scopes for:

MSM::batch_multi_scalar_mul/evaluate_work_units
MSM::batch_multi_scalar_mul/accumulate_results
MSM::batch_multi_scalar_mul/batch_normalize
MSM::batch_multi_scalar_mul/scalars_to_montgomery

These are surfaced in google-benchmark output via GOOGLE_BB_BENCH_REPORTER(state).

2) New Benchmark Case: `BatchMSM_1656`

Added a dedicated benchmark:

PippengerBench/BatchMSM_1656/256/{msm_size}
Single MSM (num_polys = 1)
Sizes:
- msm_size ∈ {2^16, 2^20}

This uses bb::set_parallel_for_concurrency(256) to force the intended partitioning and restores the original value after the benchmark finishes.

Results (Local)

Machine: 4 vCPU laptop (so 256 threads is intentionally oversubscribed; the key point is the absolute reduction overhead).

Key takeaways (from BB_BENCH counters; time counters are nanoseconds):

2^16, 256 threads:
- accumulate_results ≈ 139k ns (~0.139 ms)
- total ≈ 521 ms
- reduction fraction ~0.027%
2^20, 256 threads:
- accumulate_results ≈ 141k ns (~0.141 ms)
- total ≈ 3457 ms
- reduction fraction ~0.004%

So the final reduction appears negligible in these regimes on my setup, and the benchmark now makes it easy to validate on a 64+ core machine where the original concern is most relevant.

Notes / Background

MSM here is a Pippenger-style MSM. For background reading on multiexponentiation / multi-product methods:
- Nicholas Pippenger, On the evaluation of powers and related problems (1976).
- Jurjen Bos and Matthijs Coster, Addition Chain Heuristics (CRYPTO 1989).
- Ryan Henry, Pippenger's Multiproduct and Multiexponentiation Algorithms (2010).

How To Run

cmake --build <build-dir> --target pippenger_bench
CRS_PATH=<path-to-bn254_g1.dat> <build-dir>/bin/pippenger_bench \
  --benchmark_filter='BatchMSM_1656' \
  --benchmark_min_time=0.1s \
  --benchmark_counters_tabular=true

Fixes AztecProtocol/barretenberg#1656

peter941221 · 2026-03-23T09:00:50Z

CI3 (External) is currently blocked because this is an external PR from a fork: the workflow says external PRs need the ci-external or ci-external-once label to run.

Would someone with label permission mind adding one of those labels so CI can execute? (This PR only touches the MSM benchmark + BB_BENCH instrumentation; no CI/workflow files.)

peter941221 · 2026-03-23T10:03:02Z

Follow-up: I narrowed the BatchMSM_1656 registration to target only the original issue scenario (threads=256, msm_size ∈ {2^16, 2^20}), so the benchmark name is now:

PippengerBench/BatchMSM_1656/256/{msm_size}

Latest local run (4 vCPU laptop; BB_BENCH time counters are ns):

2^16: accumulate_results ≈ 139k ns (~0.139 ms) vs total ≈ 521 ms
2^20: accumulate_results ≈ 141k ns (~0.141 ms) vs total ≈ 3457 ms

## Summary - The `ci-external` workflow fails with `Resource not accessible by integration (removeLabelsFromLabelable)` because `github.token` defaults to read-only when no `permissions` block is set. - Adds `contents: read` + `pull-requests: write` to the `ci-external` job so `gh pr edit --remove-label "ci-external-once"` can succeed. ## Context - Safe because the workflow uses `pull_request_target` (code always from base branch, not fork) and is gated by a maintainer adding the `ci-external` / `ci-external-once` label. - Matches the pattern used by other label-modifying workflows in the repo (e.g., `merge-train-create-pr`, `auto-close-stale-drafts`). - Fixes the CI failure seen on external PRs like #21885.

AztecBot · 2026-03-24T17:52:58Z

This issue was automatically closed because it was referenced in PR #21965 which has been merged to the default branch.

View workflow run

Replaces speculative TODO with actual measurements from a 192-core machine: ~512us for 2^16 MSM (1.2% of total), ~207us for 2^20 (<0.1%).

BEGIN_COMMIT_OVERRIDE chore: bench phase breakdown + thread sweep for MSM reduction (#21885) chore: minor fixes pt. 2 (#22138) chore: minor fixes pt. 3 (#22181) fix: satisfy Chonk num_circuits >= 4 assertion in mock IVC creation (#22188) END_COMMIT_OVERRIDE

peter941221 mentioned this pull request Mar 23, 2026

Investigate: is single-threaded bucket accumulation in Pippenger a bottleneck? AztecProtocol/barretenberg#1656

Closed

johnathan79717 added ci-external-once Run CI on an external PR, but only once. and removed ci-external-once Run CI on an external PR, but only once. labels Mar 24, 2026

johnathan79717 changed the title ~~bench: phase breakdown + thread sweep for MSM reduction (barretenberg#1656)~~ chore: bench phase breakdown + thread sweep for MSM reduction Mar 24, 2026

johnathan79717 added ci-external Allow CI to run on this external pull request. and removed ci-external-once Run CI on an external PR, but only once. labels Mar 24, 2026

johnathan79717 mentioned this pull request Mar 24, 2026

fix: add pull-requests write permission to ci-external workflow #21965

Merged

AztecBot closed this Mar 24, 2026

johnathan79717 reopened this Mar 25, 2026

johnathan79717 added ci-external-once Run CI on an external PR, but only once. and removed ci-external Allow CI to run on this external pull request. labels Mar 25, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

johnathan79717 approved these changes Mar 25, 2026

View reviewed changes

johnathan79717 added ci-squash-and-merge and removed ci-squash-and-merge labels Mar 25, 2026

johnathan79717 changed the base branch from next to merge-train/barretenberg March 25, 2026 13:19

johnathan79717 requested review from a team, charlielye and nventuro as code owners March 25, 2026 13:19

johnathan79717 removed request for a team, charlielye and nventuro March 25, 2026 13:20

johnathan79717 self-assigned this Mar 25, 2026

johnathan79717 force-pushed the codex/1656-batch-msm-bench branch from 69ef274 to 07ca389 Compare March 25, 2026 13:23

johnathan79717 enabled auto-merge (squash) March 25, 2026 13:23

johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 25, 2026

AztecBot deleted the branch AztecProtocol:merge-train/barretenberg March 26, 2026 15:45

AztecBot closed this Mar 26, 2026

auto-merge was automatically disabled March 26, 2026 15:45
Pull request was closed

johnathan79717 reopened this Mar 27, 2026

peter and others added 3 commits March 27, 2026 11:14

bench: add phase breakdown and thread sweep for AztecProtocol#1656

b868aee

bench: focus BatchMSM_1656 on 256 threads

8ae239b

chore: update MSM accumulate_results comment with benchmark data

88f04ea

Replaces speculative TODO with actual measurements from a 192-core machine: ~512us for 2^16 MSM (1.2% of total), ~207us for 2^20 (<0.1%).

johnathan79717 force-pushed the codex/1656-batch-msm-bench branch from 6ebf784 to 88f04ea Compare March 27, 2026 11:14

johnathan79717 added the ci-external-once Run CI on an external PR, but only once. label Mar 27, 2026

github-actions Bot removed the ci-external-once Run CI on an external PR, but only once. label Mar 27, 2026

johnathan79717 merged commit c251a92 into AztecProtocol:merge-train/barretenberg Mar 30, 2026
21 of 22 checks passed

AztecBot mentioned this pull request Mar 30, 2026

feat: merge-train/barretenberg #22147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bench phase breakdown + thread sweep for MSM reduction#21885

chore: bench phase breakdown + thread sweep for MSM reduction#21885
johnathan79717 merged 3 commits intoAztecProtocol:merge-train/barretenbergfrom
peter941221:codex/1656-batch-msm-bench

peter941221 commented Mar 23, 2026 •

edited by johnathan79717

Loading

Uh oh!

peter941221 commented Mar 23, 2026

Uh oh!

peter941221 commented Mar 23, 2026

Uh oh!

AztecBot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

peter941221 commented Mar 23, 2026 • edited by johnathan79717 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What Changed

1) Phase Breakdown (BB_BENCH) inside batch_multi_scalar_mul

2) New Benchmark Case: BatchMSM_1656

Results (Local)

Notes / Background

How To Run

Uh oh!

peter941221 commented Mar 23, 2026

Uh oh!

peter941221 commented Mar 23, 2026

Uh oh!

AztecBot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peter941221 commented Mar 23, 2026 •

edited by johnathan79717

Loading

1) Phase Breakdown (`BB_BENCH`) inside `batch_multi_scalar_mul`

2) New Benchmark Case: `BatchMSM_1656`