refactor(bench): baseline-vs-feature comparison with structured output#14313
Draft
refactor(bench): baseline-vs-feature comparison with structured output#14313
Conversation
Replace version-comparison benchmarking with baseline-vs-feature comparison. Add structured JSON bundle output with automated regression detection. Add new benchmark types: invariant test, fork test. Make benchmark failures fatal instead of silently skipping. Pass fork URL via environment variable to avoid shell interpolation. Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
Delete combine-benchmarks.sh, format-pr-comment.sh, commit-and-read-benchmarks.sh, benchmark.sh, and LATEST.md. Simplify workflow to run benchmarks in a read-only job and post results via artifact-based publish step. Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
…ures Add self-contained Solidity benchmark suite at benches/fixtures/bench-suite/ that replaces external repo dependencies as the default benchmark target. The suite is designed to be backwards compatible (pragma >=0.8.0), has no external dependencies (no forge-std, no git submodules), and targets specific Foundry subsystems: - ERC20: baseline EVM execution, storage reads/writes - Vault: AMM constant-product pool (math-heavy, multi-contract) - Registry: mapping-heavy key-value store (storage-bound, batch ops) - FuzzERC20/FuzzVault: fuzzer input generation, property checking - InvariantVault/InvariantRegistry: handler-based invariant testing - UnitTests: test runner startup / TTFB External repos can still be used via --repos flag. Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
Add three new test files to the built-in bench suite: - CheatcodeTests.t.sol: exercises the cheatcode inspector across deal, prank, warp/roll, store/load, etch, snapshot/revertTo, mockCall, expectRevert, label, record/accesses, getNonce/setNonce, and a combined cheatcode storm - ForkTests.t.sol: exercises vm.createFork, forked state reads/writes, WETH/USDC/DAI reads, deposit on fork, vm.rollFork - MultiForkTests.t.sol: exercises multi-fork switching, vm.makePersistent, cross-fork state reads, fork switch stress test New benchmark types: forge_cheatcode_test, forge_multifork_test. Existing forge_fork_test now runs targeted ForkTests instead of global --fork-url mode. Fork/multifork tests read FORK_URL env var via vm.envString. Shared Vm interface extracted to test/Vm.sol. Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
- Fix Vm.sol: bytes -> bytes calldata for etch() - Use targetContracts() getter pattern instead of vm.targetContract() cheatcode (works on both stable and nightly) - Tighten FuzzVault swap bounds to avoid liquidity edge cases - Exclude Fork/Invariant tests from forge_test to avoid failures when FORK_URL is unset or vm version differs - Clear fuzz failure cache before each benchmark run Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
Co-Authored-By: zerosnacks <95942363+zerosnacks@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the version-comparison benchmarking tool with a baseline-vs-feature model. Adds structured JSON output, automated regression detection, and new benchmark types focused on test performance.
Changes
--json): machine-readable bundle with per-benchmark comparisons and overall verdict--noise-threshold), process exits non-zero on regressionforge_invariant_testandforge_fork_test(fork requires explicit--fork-url)Usage
Co-Authored-By: zerosnacks 95942363+zerosnacks@users.noreply.github.com
Prompted by: zerosnacks