GitHub - pyk/bench: Fast & Accurate Microbenchmarking for Zig

Fast & Accurate Microbenchmarking for Zig

Demo

Let's benchmark fib:

const std = @import("std");
const bench = @import("bench");

fn fibNaive(n: u64) u64 {
    if (n <= 1) return n;
    return fibNaive(n - 1) + fibNaive(n - 2);
}

fn fibIterative(n: u64) u64 {
    if (n == 0) return 0;
    var a: u64 = 0;
    var b: u64 = 1;
    for (2..n + 1) |_| {
        const c = a + b;
        a = b;
        b = c;
    }
    return b;
}

pub fn main() !void {
    const allocator = std.heap.smp_allocator;
    const opts = bench.Options{
        .sample_size = 100,
        .warmup_iters = 3,
    };
    const m_naive = try bench.run(allocator, "fibNaive/30", fibNaive, .{30}, opts);
    const m_iter = try bench.run(allocator, "fibIterative/30", fibIterative, .{30}, opts);

    try bench.report(.{
        .metrics = &.{ m_naive, m_iter },
        .baseline_index = 0, // naive as baseline
    });
}

Run it, and you will get the following output in your terminal:

| Benchmark         |    Time |    Speedup | Iterations |    Ops/s | Cycles | Instructions |  IPC | Cache Misses |
| :---------------- | ------: | ---------: | ---------: | -------: | -----: | -----------: | ---: | -----------: |
| `fibNaive/30`     | 1.78 ms |      1.00x |          1 |  563.2/s |   8.1M |        27.8M | 3.41 |          0.3 |
| `fibIterative/30` | 3.44 ns | 516055.19x |     300006 | 290.6M/s |   15.9 |         82.0 | 5.15 |          0.0 |

The benchmark report generates valid Markdown, so you can copy-paste it directly into a markdown file:

Benchmark	Time	Speedup	Iterations	Ops/s	Cycles	Instructions	IPC	Cache Misses
`fibNaive/30`	1.78 ms	1.00x	1	563.2/s	8.1M	27.8M	3.41	0.3
`fibIterative/30`	3.44 ns	516055.19x	300006	290.6M/s	15.9	82.0	5.15	0.0

Features

CPU Counters: Measures CPU cycles, instructions, IPC, and cache misses directly from the kernel (Linux only).
Argument Support: Pass pre-calculated data to your functions to separate setup overhead from the benchmark loop.
Baseline Comparison: Easily compare multiple implementations against a reference function to see relative speedups or regressions.
Flexible Reporting: Access raw metric data programmatically to generate custom reports (JSON, CSV) or assert performance limits in CI.
Easy Throughput Metrics: Automatically calculates operations per second and data throughput (MB/s, GB/s) when payload size is provided.
Robust Statistics: Uses median and standard deviation to provide reliable metrics despite system noise.

Installation

Fetch the latest version:

zig fetch --save=bench https://github.com/pyk/bench/archive/main.tar.gz

Then add this to your build.zig:

const bench = b.dependency("bench", .{
    .target = target,
    .optimize = optimize,
});

// Use it on a module
const mod = b.createModule(.{
    .target = target,
    .optimize = optimize,
    .imports = &.{
        .{ .name = "bench", .module = bench.module("bench") },
    },
});

// Or executable
const my_bench = b.addExecutable(.{
    .name = "my-bench",
    .root_module = b.createModule(.{
        .root_source_file = b.path("bench/my-bench.zig"),
        .target = target,
        .optimize = .ReleaseFast,
        .imports = &.{
            .{ .name = "bench", .module = bench.module("bench") },
        },
    }),
});

If you are using it only for tests/benchmarks, it is recommended to mark it as lazy:

.dependencies = .{
    .bench = .{
        .url = "...",
        .hash = "...",
        .lazy = true, // here
    },
}

Usage

Basic Run

To benchmark a single function, pass the allocator, a name, and the function pointer to run.

const res = try bench.run(allocator, "My Function", myFn, .{});
try bench.report(.{ .metrics = &.{res} });

Run with Arguments

You can generate test data before the benchmark starts and pass it via a tuple. This ensures the setup cost doesn't pollute your measurements.

// Setup data outside the benchmark
const input = try generateLargeString(allocator, 10_000);

// Pass input as a tuple
const res = try bench.run(allocator, "Parser", parseFn, .{input}, .{});

Comparing Implementations

You can run multiple benchmarks and compare them against a baseline. The baseline_index determines which result is used as the reference (1.00x).

const a = try bench.run(allocator, "Implementation A", implA, .{});
const b = try bench.run(allocator, "Implementation B", implB, .{});

try bench.report(.{
    .metrics = &.{ a, b },
    // Use the first metric (Implementation A) as the baseline
    .baseline_index = 0,
});

Measuring Throughput

If your function processes data (like copying memory or parsing strings), provide bytes_per_op to get throughput metrics (MB/s or GB/s).

const size = 1024 * 1024;
const res = try bench.run(allocator, "Memcpy 1MB", copyFn, .{
    .bytes_per_op = size,
});

// Report will now show GB/s instead of just Ops/s
try bench.report(.{ .metrics = &.{res} });

Configuration

You can tune the benchmark behavior by modifying the Options struct.

const res = try bench.run(allocator, "Heavy Task", heavyFn, .{
    .warmup_iters = 10,     // Default: 100
    .sample_size = 50,      // Default: 1000
});

Built-in Reporter

The default bench.report prints a clean, Markdown-compatible table to stdout. It automatically handles unit scaling (ns, us, ms, s) and formatting.

| Benchmark         |    Time |    Speedup | Iterations |    Ops/s | Cycles | Instructions |  IPC | Cache Misses |
| :---------------- | ------: | ---------: | ---------: | -------: | -----: | -----------: | ---: | -----------: |
| `fibNaive/30`     | 1.78 ms |      1.00x |          1 |  563.2/s |   8.1M |        27.8M | 3.41 |          0.3 |
| `fibIterative/30` | 3.44 ns | 516055.19x |     300006 | 290.6M/s |   15.9 |         82.0 | 5.15 |          0.0 |

Custom Reporter

The run function returns a Metrics struct containing all raw data (min, max, median, variance, cycles, etc.). You can use this to generate JSON, CSV, or assert performance limits in CI.

const metrics = try bench.run(allocator, "MyFn", myFn, .{});

// Access raw fields directly
std.debug.print("Median: {d}ns, Max: {d}ns\n", .{
    metrics.median_ns,
    metrics.max_ns
});

Supported Metrics

The run function returns a Metrics struct containing the following data points:

Category	Metric	Description
Meta	`name`	The identifier string for the benchmark.
Time	`min_ns`	Minimum execution time per operation (nanoseconds).
Time	`max_ns`	Maximum execution time per operation (nanoseconds).
Time	`mean_ns`	Arithmetic mean execution time (nanoseconds).
Time	`median_ns`	Median execution time (nanoseconds).
Time	`std_dev_ns`	Standard deviation of the execution time.
Meta	`samples`	Total number of measurement samples collected.
Throughput	`ops_sec`	Calculated operations per second.
Throughput	`mb_sec`	Data throughput in MB/s (populated if `bytes_per_op` > 0).
Hardware*	`cycles`	Average CPU cycles per operation.
Hardware*	`instructions`	Average CPU instructions executed per operation.
Hardware*	`ipc`	Instructions Per Cycle (efficiency ratio).
Hardware*	`cache_misses`	Average cache misses per operation.

*Hardware metrics are currently available on Linux only. They will be null on other platforms or if permissions are restricted.

Tips

Use Profiling Tools to Find Why Code is Slow

bench shows you the time your code takes. It tells you "what" the speed is. But it does not tell you "why" it is slow. To find out why, use tools like perf on Linux. These tools show you where the CPU spends time. For example, perf record runs your code and collects data. Then perf report or Firefox Profiler shows hotspots. This helps you fix the real problems.

Use `std.mem.doNotOptimizeAway`

The compiler can remove code if it thinks it does nothing. For example, if you compute a value but never use it, the compiler skips the work. This makes benchmarks wrong. It shows fast times for code that does not run.

To stop this, use std.mem.doNotOptimizeAway. Pass your result to it. The compiler must compute it then. For example, in a scanner or tokenizer:

while (true) {
    const token = try scanner.next();
    if (token == .end) break;
    std.mem.doNotOptimizeAway(token); // CRITICAL
}

Here, doNotOptimizeAway(token) forces the compiler to run scanner.next(). Without it, the loop might empty. Always use this on key results. Like counts, parsed values, or outputs.

Enable Kernel Perf Event on Linux for Hardware Metrics

On Linux, hardware metrics like cycles and instructions come from the kernel. But by default, it limits access. You get null values.

To fix, run:

sudo sysctl -w kernel.perf_event_paranoid=-1

This allows your code to read counters. Set to 2 to restrict again.

Check with cat /proc/sys/kernel/perf_event_paranoid. Lower values mean more access. Value -1 is full. Use it for benchmarks. But be careful in production.

Avoid Constant Inputs in Benchmarks

If you use constant data like const input = "hello";, the compiler knows it at build time. It can unroll loops or compute results ahead. Your benchmark measures nothing real. Times stay flat even if data grows.

Instead, use runtime data. Allocate a buffer and fill it.

Bad example:

const input = "    hello"; // Compiler knows every byte
const res = try bench.run(allocator, "Parser", parse, .{input}, .{});

Good example:

var input = try allocator.alloc(u8, 100);
defer allocator.free(input);
@memset(input[0..4], ' ');
@memcpy(input[4..], "hello");
const res = try bench.run(allocator, "Parser", parse, .{input}, .{});

Now, the buffer is dynamic. The compiler cannot fold it. Times scale with real work. For varying tests, change the memset size each run.

References

Devlog

Prior Art

Resources

Google Benchmark User Guide

Development

Install the Zig toolchain via mise (optional):

mise trust
mise install

Run tests:

zig build test --summary all

Build library:

zig build

Enable/disable kernel.perf_event_paranoid for debugging:

# Restrict access
sudo sysctl -w kernel.perf_event_paranoid=2

# Allow access (Required for CPU metrics)
sudo sysctl -w kernel.perf_event_paranoid=-1

License

MIT. Use it for whatever.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github		.github
.vscode		.vscode
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
mise.toml		mise.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demo

Features

Installation

Usage

Basic Run

Run with Arguments

Comparing Implementations

Measuring Throughput

Configuration

Built-in Reporter

Custom Reporter

Supported Metrics

Tips

Use Profiling Tools to Find Why Code is Slow

Use `std.mem.doNotOptimizeAway`

Enable Kernel Perf Event on Linux for Hardware Metrics

Avoid Constant Inputs in Benchmarks

References

Devlog

Prior Art

Resources

Development

License

About

Uh oh!

Languages

License

pyk/bench

Folders and files

Latest commit

History

Repository files navigation

Demo

Features

Installation

Usage

Basic Run

Run with Arguments

Comparing Implementations

Measuring Throughput

Configuration

Built-in Reporter

Custom Reporter

Supported Metrics

Tips

Use Profiling Tools to Find Why Code is Slow

Use std.mem.doNotOptimizeAway

Enable Kernel Perf Event on Linux for Hardware Metrics

Avoid Constant Inputs in Benchmarks

References

Devlog

Prior Art

Resources

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

Use `std.mem.doNotOptimizeAway`