Skip to content

perf(ans): replace unconditional state normalization w/ explicit branch#704

Open
xangelix wants to merge 2 commits intolibjxl:mainfrom
xangelix:perf-ans-state-normalization
Open

perf(ans): replace unconditional state normalization w/ explicit branch#704
xangelix wants to merge 2 commits intolibjxl:mainfrom
xangelix:perf-ans-state-normalization

Conversation

@xangelix
Copy link
Contributor

@xangelix xangelix commented Mar 14, 2026

Changes

Replaces the unconditionally executed state calculation in AnsHistogram::read with an explicit if statement for ANS state normalization.

The previous implementation attempted to be branchless (?) by evaluating the appended state on every loop iteration. However, this forced BitReader::peek(16) to execute on every single decoded symbol, incurring heavy logic via the BitReader's internal buffer bounds-checks. Using an explicit branch allows the CPU's branch predictor to bypass the BitReader logic entirely when normalization is not required.

System Details

Kernel: Linux 6.19.6-2-cachyos
CPU: AMD Ryzen 9 9950X3D (32) @ 5.85 GHz
rustc: 1.95.0-nightly (873b4beb0 2026-02-15)

As compared to e883140.

I seem to get a fair amount of noise run-to-run on my system with criterion, so I'd love some validation of these numbers on other systems!


Local Comparison Test Code
use std::hint::black_box;

use criterion::{BenchmarkId, Criterion, criterion_group, criterion_main};
use jxl::bit_reader::BitReader;

const LOG_SUM_PROBS: usize = 12;

#[derive(Clone, Copy)]
struct BenchBucket {
    alias_symbol: u8,
    alias_cutoff: u8,
    dist: u16,
    alias_offset: u16,
    alias_dist_xor: u16,
}

#[inline(always)]
fn read_branchless(
    buckets: &[BenchBucket],
    log_bucket_size: usize,
    bucket_mask: u32,
    br: &mut BitReader,
    state: &mut u32,
) -> u32 {
    let idx = *state & 0xfff;
    let i = (idx >> log_bucket_size) as usize;
    let pos = idx & bucket_mask;

    let bucket = buckets[i & (buckets.len() - 1)];
    let alias_symbol = bucket.alias_symbol as u32;
    let alias_cutoff = bucket.alias_cutoff as u32;
    let dist = bucket.dist as u32;

    let map_to_alias = (pos >= alias_cutoff) as u32;
    let offset = (bucket.alias_offset as u32) * map_to_alias;
    let dist_xor = (bucket.alias_dist_xor as u32) * map_to_alias;

    let dist = dist ^ dist_xor;
    let symbol = (alias_symbol * map_to_alias) | (i as u32 * (1 - map_to_alias));
    let offset = offset + pos;

    let next_state = (*state >> LOG_SUM_PROBS) * dist + offset;

    // Old
    let select_appended = (next_state < (1 << 16)) as u32;
    let appended_state = (next_state << 16) | (br.peek(16) as u32);
    *state = (appended_state * select_appended) | (next_state * (1 - select_appended));
    br.consume_optimistic((16 * select_appended) as usize);

    symbol
}

#[inline(always)]
fn read_branched(
    buckets: &[BenchBucket],
    log_bucket_size: usize,
    bucket_mask: u32,
    br: &mut BitReader,
    state: &mut u32,
) -> u32 {
    let idx = *state & 0xfff;
    let i = (idx >> log_bucket_size) as usize;
    let pos = idx & bucket_mask;

    let bucket = buckets[i & (buckets.len() - 1)];
    let alias_symbol = bucket.alias_symbol as u32;
    let alias_cutoff = bucket.alias_cutoff as u32;
    let dist = bucket.dist as u32;

    let map_to_alias = (pos >= alias_cutoff) as u32;
    let offset = (bucket.alias_offset as u32) * map_to_alias;
    let dist_xor = (bucket.alias_dist_xor as u32) * map_to_alias;

    let dist = dist ^ dist_xor;
    let symbol = (alias_symbol * map_to_alias) | (i as u32 * (1 - map_to_alias));
    let offset = offset + pos;

    let mut next_state = (*state >> LOG_SUM_PROBS) * dist + offset;

    // New
    if next_state < (1 << 16) {
        next_state = (next_state << 16) | (br.read_optimistic(16) as u32);
    }
    *state = next_state;

    symbol
}

fn bench_ans_read(c: &mut Criterion) {
    let mut group = c.benchmark_group("ans_optimization");

    // Generate 1MB of pseudo-random data to feed the BitReader
    let mut random_data = vec![0u8; 1024 * 1024];
    let mut seed = 42u32;
    for byte in &mut random_data {
        seed = seed.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
        *byte = (seed >> 24) as u8;
    }

    let num_symbols_to_decode = 10_000;

    // 1: Highly Predictable (Rare Normalization)
    let bucket_rare = BenchBucket {
        alias_symbol: 0,
        alias_cutoff: 0,
        dist: 4096, // Max dist, state never shrinks
        alias_offset: 0,
        alias_dist_xor: 0,
    };

    // 2: Highly Predictable (Frequent Normalization)
    let bucket_freq = BenchBucket {
        alias_symbol: 0,
        alias_cutoff: 0,
        dist: 1, // Min dist, state shrinks instantly
        alias_offset: 0,
        alias_dist_xor: 0,
    };

    // 3: Mixed Unpredictable (50/50 Coin Flip)
    let bucket_mixed = BenchBucket {
        alias_symbol: 0,
        alias_cutoff: 0,
        dist: 128,
        alias_offset: 0,
        alias_dist_xor: 0,
    };

    let scenarios = [
        ("rare_norm_predictable", bucket_rare),
        ("freq_norm_predictable", bucket_freq),
        ("mixed_norm_unpredictable", bucket_mixed),
    ];

    for (name, dummy_bucket) in &scenarios {
        let buckets = vec![*dummy_bucket; 16];
        let log_bucket_size = 8;
        let bucket_mask = (1 << log_bucket_size) - 1;

        group.bench_function(BenchmarkId::new("Branchless", name), |b| {
            b.iter(|| {
                let mut br = BitReader::new(&random_data);
                let mut state = 0x0013_0000;
                for _ in 0..num_symbols_to_decode {
                    black_box(read_branchless(
                        &buckets,
                        log_bucket_size,
                        bucket_mask,
                        &mut br,
                        &mut state,
                    ));
                }
            });
        });

        group.bench_function(BenchmarkId::new("Branched", name), |b| {
            b.iter(|| {
                let mut br = BitReader::new(&random_data);
                let mut state = 0x0013_0000;
                for _ in 0..num_symbols_to_decode {
                    black_box(read_branched(
                        &buckets,
                        log_bucket_size,
                        bucket_mask,
                        &mut br,
                        &mut state,
                    ));
                }
            });
        });
    }

    group.finish();
}

criterion_group!(benches, bench_ans_read);
criterion_main!(benches);

Localized Results

ans_optimization/Branchless/rare_norm_predictable
                        time:   [32.314 µs 32.389 µs 32.468 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
ans_optimization/Branched/rare_norm_predictable
                        time:   [26.570 µs 26.622 µs 26.681 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
ans_optimization/Branchless/freq_norm_predictable
                        time:   [32.764 µs 32.836 µs 32.911 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
ans_optimization/Branched/freq_norm_predictable
                        time:   [29.137 µs 29.188 µs 29.243 µs]
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe
ans_optimization/Branchless/mixed_norm_unpredictable
                        time:   [32.142 µs 32.187 µs 32.236 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
ans_optimization/Branched/mixed_norm_unpredictable
                        time:   [27.733 µs 27.793 µs 27.859 µs]
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

Full Decode Results
     Running benches/decode.rs (target/release/deps/decode-b3a0c96c13d23683)
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/3x3_jpeg_recompression.jxl
                        time:   [44.552 µs 44.641 µs 44.738 µs]
                        thrpt:  [201.17 Kelem/s 201.61 Kelem/s 202.01 Kelem/s]
                 change:
                        time:   [−1.9582% −1.6081% −1.2487%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2644% +1.6344% +1.9973%]
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  5 (10.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/3x3_srgb_lossless.jxl
                        time:   [7.7459 µs 7.7588 µs 7.7735 µs]
                        thrpt:  [1.1578 Melem/s 1.1600 Melem/s 1.1619 Melem/s]
                 change:
                        time:   [−1.2301% −0.9984% −0.7748%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7808% +1.0084% +1.2454%]
                        Change within noise threshold.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/3x3_srgb_lossy.jxl
                        time:   [42.414 µs 42.480 µs 42.554 µs]
                        thrpt:  [211.50 Kelem/s 211.87 Kelem/s 212.20 Kelem/s]
                 change:
                        time:   [−0.5074% −0.2195% +0.0583%] (p = 0.14 > 0.05)
                        thrpt:  [−0.0583% +0.2200% +0.5100%]
                        No change in performance detected.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/3x3a_srgb_lossless.jxl
                        time:   [9.3486 µs 9.3617 µs 9.3757 µs]
                        thrpt:  [959.93 Kelem/s 961.36 Kelem/s 962.71 Kelem/s]
                 change:
                        time:   [+1.0287% +1.2762% +1.5290%] (p = 0.00 < 0.05)
                        thrpt:  [−1.5060% −1.2601% −1.0183%]
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/3x3a_srgb_lossy.jxl
                        time:   [51.308 µs 51.425 µs 51.559 µs]
                        thrpt:  [174.56 Kelem/s 175.01 Kelem/s 175.41 Kelem/s]
                 change:
                        time:   [−2.9443% −2.5371% −2.0991%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1441% +2.6032% +3.0336%]
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  3 (6.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/8x8_noise.jxl
                        time:   [26.595 µs 26.678 µs 26.761 µs]
                        thrpt:  [2.3916 Melem/s 2.3989 Melem/s 2.4064 Melem/s]
                 change:
                        time:   [−2.0444% −1.7408% −1.3973%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4171% +1.7717% +2.0870%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/basic.jxl
                        time:   [41.193 µs 41.285 µs 41.397 µs]
                        thrpt:  [24.156 Kelem/s 24.222 Kelem/s 24.276 Kelem/s]
                 change:
                        time:   [−0.4785% −0.1849% +0.1419%] (p = 0.28 > 0.05)
                        thrpt:  [−0.1417% +0.1852% +0.4808%]
                        No change in performance detected.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) high mild
  3 (6.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/candle.jxl
                        time:   [28.888 ms 29.011 ms 29.147 ms]
                        thrpt:  [27.790 Melem/s 27.921 Melem/s 28.039 Melem/s]
                 change:
                        time:   [−6.3361% −5.6027% −4.8098%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0528% +5.9352% +6.7648%]
                        Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/cropped_traffic_light.jxl
                        time:   [501.77 µs 502.34 µs 502.99 µs]
                        thrpt:  [7.9524 Melem/s 7.9627 Melem/s 7.9718 Melem/s]
                 change:
                        time:   [−6.4608% −6.3205% −6.1722%] (p = 0.00 < 0.05)
                        thrpt:  [+6.5782% +6.7469% +6.9071%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/dice.jxl
                        time:   [18.594 ms 18.675 ms 18.760 ms]
                        thrpt:  [25.586 Melem/s 25.702 Melem/s 25.815 Melem/s]
                 change:
                        time:   [−2.5877% −2.0427% −1.5243%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5479% +2.0853% +2.6565%]
                        Performance has improved.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/efb.jxl
                        time:   [51.070 ms 51.163 ms 51.255 ms]
                        thrpt:  [23.436 Melem/s 23.478 Melem/s 23.521 Melem/s]
                 change:
                        time:   [−5.6475% −5.2918% −4.9225%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1774% +5.5875% +5.9856%]
                        Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/extra_channels.jxl
                        time:   [15.706 µs 15.724 µs 15.743 µs]
                        thrpt:  [4.0652 Melem/s 4.0701 Melem/s 4.0749 Melem/s]
                 change:
                        time:   [−3.7122% −3.2948% −2.9332%] (p = 0.00 < 0.05)
                        thrpt:  [+3.0218% +3.4071% +3.8553%]
                        Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/grayscale_patches_modular.jxl
                        time:   [8.0656 ms 8.0791 ms 8.0937 ms]
                        thrpt:  [34.600 Melem/s 34.663 Melem/s 34.721 Melem/s]
                 change:
                        time:   [−4.8980% −4.6434% −4.3952%] (p = 0.00 < 0.05)
                        thrpt:  [+4.5973% +4.8695% +5.1502%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/grayscale_patches_var_dct.jxl
                        time:   [3.6564 ms 3.6640 ms 3.6719 ms]
                        thrpt:  [76.267 Melem/s 76.432 Melem/s 76.590 Melem/s]
                 change:
                        time:   [−0.5317% −0.2892% −0.0325%] (p = 0.03 < 0.05)
                        thrpt:  [+0.0325% +0.2900% +0.5345%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/green_queen_modular_e3.jxl
                        time:   [14.964 ms 14.991 ms 15.022 ms]
                        thrpt:  [17.174 Melem/s 17.209 Melem/s 17.241 Melem/s]
                 change:
                        time:   [+3.4016% +3.6516% +3.9189%] (p = 0.00 < 0.05)
                        thrpt:  [−3.7711% −3.5230% −3.2897%]
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/green_queen_vardct_e3.jxl
                        time:   [4.3018 ms 4.3148 ms 4.3293 ms]
                        thrpt:  [59.590 Melem/s 59.791 Melem/s 59.970 Melem/s]
                 change:
                        time:   [−0.4959% −0.1182% +0.3070%] (p = 0.56 > 0.05)
                        thrpt:  [−0.3061% +0.1184% +0.4984%]
                        No change in performance detected.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/has_permutation.jxl
                        time:   [14.458 ms 14.512 ms 14.568 ms]
                        thrpt:  [154.46 Melem/s 155.06 Melem/s 155.64 Melem/s]
                 change:
                        time:   [−6.2317% −5.7692% −5.2941%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5900% +6.1224% +6.6458%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/has_permutation_with_container.jxl
                        time:   [14.589 ms 14.646 ms 14.708 ms]
                        thrpt:  [152.99 Melem/s 153.63 Melem/s 154.24 Melem/s]
                 change:
                        time:   [−5.1615% −4.5832% −3.9787%] (p = 0.00 < 0.05)
                        thrpt:  [+4.1436% +4.8033% +5.4424%]
                        Performance has improved.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/hdr_hlg_test.jxl
                        time:   [49.634 ms 49.715 ms 49.800 ms]
                        thrpt:  [21.056 Melem/s 21.092 Melem/s 21.126 Melem/s]
                 change:
                        time:   [−1.5298% −1.3443% −1.1588%] (p = 0.00 < 0.05)
                        thrpt:  [+1.1723% +1.3626% +1.5536%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/hdr_pq_test.jxl
                        time:   [50.269 ms 50.412 ms 50.570 ms]
                        thrpt:  [20.735 Melem/s 20.800 Melem/s 20.859 Melem/s]
                 change:
                        time:   [+0.2478% +0.5796% +0.9470%] (p = 0.00 < 0.05)
                        thrpt:  [−0.9381% −0.5763% −0.2472%]
                        Change within noise threshold.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/issue648_palette0.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 6.8s, or reduce sample count to 30.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/issue648_palette0.jxl
                        time:   [135.84 ms 136.15 ms 136.45 ms]
                        thrpt:  [12.846 Melem/s 12.875 Melem/s 12.903 Melem/s]
                 change:
                        time:   [+1.4882% +1.7382% +2.0047%] (p = 0.00 < 0.05)
                        thrpt:  [−1.9653% −1.7085% −1.4664%]
                        Performance has regressed.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/large_header.jxl
                        time:   [4.3944 ms 4.4150 ms 4.4366 ms]
                        thrpt:  [225.40  elem/s 226.50  elem/s 227.56  elem/s]
                 change:
                        time:   [+0.5040% +1.0646% +1.6573%] (p = 0.00 < 0.05)
                        thrpt:  [−1.6303% −1.0534% −0.5015%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/lossy_with_icc.jxl
                        time:   [46.390 µs 46.482 µs 46.591 µs]
                        thrpt:  [21.463 Kelem/s 21.514 Kelem/s 21.556 Kelem/s]
                 change:
                        time:   [+1.7592% +2.0872% +2.4104%] (p = 0.00 < 0.05)
                        thrpt:  [−2.3536% −2.0445% −1.7288%]
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/multiple_layers_noise_spline.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 9.0s, or reduce sample count to 20.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/multiple_layers_noise_spline.jxl
                        time:   [179.02 ms 179.42 ms 179.81 ms]
                        thrpt:  [13.121 Melem/s 13.150 Melem/s 13.179 Melem/s]
                 change:
                        time:   [−2.1766% −1.8574% −1.5343%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5582% +1.8925% +2.2250%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/multiple_lf_420.jxl
                        time:   [23.416 ms 23.479 ms 23.543 ms]
                        thrpt:  [231.19 Melem/s 231.82 Melem/s 232.44 Melem/s]
                 change:
                        time:   [−5.9474% −5.3137% −4.6816%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9116% +5.6119% +6.3234%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/named_frame_test.jxl
                        time:   [55.445 µs 55.552 µs 55.662 µs]
                        thrpt:  [1.1498 Melem/s 1.1521 Melem/s 1.1543 Melem/s]
                 change:
                        time:   [−1.5056% −1.2085% −0.9449%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9539% +1.2233% +1.5286%]
                        Change within noise threshold.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/oddsize_ups.jxl
                        time:   [720.29 µs 721.61 µs 723.20 µs]
                        thrpt:  [91.329 Melem/s 91.531 Melem/s 91.698 Melem/s]
                 change:
                        time:   [−0.7371% −0.4582% −0.1847%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1850% +0.4603% +0.7426%]
                        Change within noise threshold.
Found 5 outliers among 50 measurements (10.00%)
  4 (8.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation1_identity.jxl
                        time:   [700.49 µs 701.65 µs 702.95 µs]
                        thrpt:  [36.418 Melem/s 36.486 Melem/s 36.546 Melem/s]
                 change:
                        time:   [−3.4679% −3.2552% −3.0238%] (p = 0.00 < 0.05)
                        thrpt:  [+3.1181% +3.3648% +3.5925%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation2_flip_horizontal.jxl
                        time:   [739.47 µs 740.29 µs 741.24 µs]
                        thrpt:  [34.537 Melem/s 34.581 Melem/s 34.619 Melem/s]
                 change:
                        time:   [−3.1962% −3.0497% −2.9126%] (p = 0.00 < 0.05)
                        thrpt:  [+3.0000% +3.1457% +3.3018%]
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  3 (6.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation3_rotate_180.jxl
                        time:   [736.74 µs 737.94 µs 739.20 µs]
                        thrpt:  [34.632 Melem/s 34.691 Melem/s 34.748 Melem/s]
                 change:
                        time:   [−3.4459% −3.2460% −3.0585%] (p = 0.00 < 0.05)
                        thrpt:  [+3.1550% +3.3549% +3.5689%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation4_flip_vertical.jxl
                        time:   [699.99 µs 701.04 µs 702.23 µs]
                        thrpt:  [36.455 Melem/s 36.517 Melem/s 36.572 Melem/s]
                 change:
                        time:   [−3.4460% −3.2775% −3.1264%] (p = 0.00 < 0.05)
                        thrpt:  [+3.2273% +3.3886% +3.5690%]
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  3 (6.00%) high mild
  3 (6.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation5_transpose.jxl
                        time:   [737.51 µs 738.63 µs 739.79 µs]
                        thrpt:  [34.604 Melem/s 34.659 Melem/s 34.711 Melem/s]
                 change:
                        time:   [−3.2231% −3.0364% −2.8469%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9303% +3.1315% +3.3305%]
                        Performance has improved.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation6_rotate_90_cw.jxl
                        time:   [737.14 µs 739.17 µs 741.35 µs]
                        thrpt:  [34.532 Melem/s 34.633 Melem/s 34.729 Melem/s]
                 change:
                        time:   [−3.3612% −3.0295% −2.6780%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7517% +3.1241% +3.4781%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation7_anti_transpose.jxl
                        time:   [731.79 µs 732.90 µs 734.08 µs]
                        thrpt:  [34.874 Melem/s 34.930 Melem/s 34.983 Melem/s]
                 change:
                        time:   [−4.4105% −4.2564% −4.1063%] (p = 0.00 < 0.05)
                        thrpt:  [+4.2822% +4.4456% +4.6140%]
                        Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/orientation8_rotate_90_ccw.jxl
                        time:   [732.99 µs 734.53 µs 736.33 µs]
                        thrpt:  [34.767 Melem/s 34.852 Melem/s 34.925 Melem/s]
                 change:
                        time:   [−3.6278% −3.3927% −3.1186%] (p = 0.00 < 0.05)
                        thrpt:  [+3.2190% +3.5118% +3.7643%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  1 (2.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/patch_y_out_of_bounds.jxl
                        time:   [58.664 µs 58.760 µs 58.864 µs]
                        thrpt:  [3.5336 Melem/s 3.5398 Melem/s 3.5456 Melem/s]
                 change:
                        time:   [+1.0547% +1.3125% +1.5715%] (p = 0.00 < 0.05)
                        thrpt:  [−1.5472% −1.2955% −1.0437%]
                        Performance has regressed.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/pq_gradient.jxl
                        time:   [1.5927 ms 1.5950 ms 1.5980 ms]
                        thrpt:  [43.574 Melem/s 43.657 Melem/s 43.721 Melem/s]
                 change:
                        time:   [−5.7863% −5.6100% −5.3970%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7049% +5.9434% +6.1416%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/progressive_ac.jxl
                        time:   [49.257 ms 49.344 ms 49.431 ms]
                        thrpt:  [69.374 Melem/s 69.497 Melem/s 69.618 Melem/s]
                 change:
                        time:   [+0.0335% +0.3083% +0.5938%] (p = 0.04 < 0.05)
                        thrpt:  [−0.5903% −0.3073% −0.0335%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/small_grayscale_patches_modular.jxl
                        time:   [693.83 µs 694.94 µs 696.23 µs]
                        thrpt:  [28.643 Melem/s 28.696 Melem/s 28.742 Melem/s]
                 change:
                        time:   [−4.2665% −4.0719% −3.8655%] (p = 0.00 < 0.05)
                        thrpt:  [+4.0209% +4.2447% +4.4566%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/small_grayscale_patches_modular_with_icc.jx...
                        time:   [703.11 µs 705.02 µs 707.17 µs]
                        thrpt:  [28.200 Melem/s 28.286 Melem/s 28.363 Melem/s]
                 change:
                        time:   [−4.0480% −3.7478% −3.4294%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5512% +3.8937% +4.2188%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/spline_on_first_frame.jxl
                        time:   [63.524 µs 63.677 µs 63.868 µs]
                        thrpt:  [16.033 Melem/s 16.081 Melem/s 16.120 Melem/s]
                 change:
                        time:   [−5.4362% −5.1601% −4.8720%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1215% +5.4408% +5.7487%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/splines.jxl
                        time:   [1.4908 ms 1.4927 ms 1.4949 ms]
                        thrpt:  [68.499 Melem/s 68.599 Melem/s 68.689 Melem/s]
                 change:
                        time:   [−5.2265% −5.0232% −4.8267%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0715% +5.2888% +5.5148%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/squeeze_alpha.jxl
                        time:   [887.85 µs 889.82 µs 891.95 µs]
                        thrpt:  [80.567 Melem/s 80.760 Melem/s 80.940 Melem/s]
                 change:
                        time:   [−19.031% −18.670% −18.337%] (p = 0.00 < 0.05)
                        thrpt:  [+22.455% +22.957% +23.504%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/squeeze_edge.jxl
                        time:   [2.6561 ms 2.6653 ms 2.6758 ms]
                        thrpt:  [98.351 Melem/s 98.737 Melem/s 99.080 Melem/s]
                 change:
                        time:   [−21.719% −21.366% −21.012%] (p = 0.00 < 0.05)
                        thrpt:  [+26.601% +27.172% +27.746%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/squeeze_empty_residual.jxl
                        time:   [380.66 µs 381.69 µs 382.81 µs]
                        thrpt:  [10.700 Melem/s 10.731 Melem/s 10.760 Melem/s]
                 change:
                        time:   [−1.6098% −1.2737% −0.9029%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9111% +1.2902% +1.6361%]
                        Change within noise threshold.
Found 5 outliers among 50 measurements (10.00%)
  4 (8.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/stp2_520x260_d25_e6.jxl
                        time:   [1.2611 ms 1.2646 ms 1.2684 ms]
                        thrpt:  [106.59 Melem/s 106.91 Melem/s 107.21 Melem/s]
                 change:
                        time:   [+3.2600% +3.6664% +4.0813%] (p = 0.00 < 0.05)
                        thrpt:  [−3.9212% −3.5367% −3.1571%]
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/tirr_photo.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 42.0s, or reduce sample count to 10.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/tirr_photo.jxl
                        time:   [833.58 ms 835.21 ms 836.85 ms]
                        thrpt:  [28.376 Melem/s 28.432 Melem/s 28.488 Melem/s]
                 change:
                        time:   [−4.6528% −4.3664% −4.0924%] (p = 0.00 < 0.05)
                        thrpt:  [+4.2671% +4.5657% +4.8798%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/tree_max_property_20.jxl
                        time:   [60.350 ms 60.569 ms 60.793 ms]
                        thrpt:  [17.248 Melem/s 17.312 Melem/s 17.375 Melem/s]
                 change:
                        time:   [−1.2494% −0.8790% −0.5014%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5039% +0.8868% +1.2653%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/with_icc.jxl
                        time:   [502.07 µs 503.09 µs 504.81 µs]
                        thrpt:  [39.504 Melem/s 39.639 Melem/s 39.719 Melem/s]
                 change:
                        time:   [−5.2112% −4.9416% −4.5715%] (p = 0.00 < 0.05)
                        thrpt:  [+4.7905% +5.1985% +5.4977%]
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  3 (6.00%) high mild
  3 (6.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/with_preview.jxl
                        time:   [155.67 µs 156.11 µs 156.62 µs]
                        thrpt:  [26.152 Melem/s 26.238 Melem/s 26.311 Melem/s]
                 change:
                        time:   [+0.1047% +0.5127% +0.9461%] (p = 0.02 < 0.05)
                        thrpt:  [−0.9373% −0.5101% −0.1046%]
                        Change within noise threshold.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/zoltan_tasi_unsplash.jxl
                        time:   [18.513 ms 18.560 ms 18.610 ms]
                        thrpt:  [51.999 Melem/s 52.137 Melem/s 52.269 Melem/s]
                 change:
                        time:   [−2.9996% −2.4520% −1.9011%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9380% +2.5136% +3.0923%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/gray_alpha_lossless.jxl
                        time:   [204.48 µs 204.63 µs 204.80 µs]
                        thrpt:  [20.000 Melem/s 20.017 Melem/s 20.031 Melem/s]
                 change:
                        time:   [−5.9276% −5.7569% −5.5938%] (p = 0.00 < 0.05)
                        thrpt:  [+5.9253% +6.1086% +6.3011%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/alpha_nonpremultipl...
                        time:   [51.959 ms 52.224 ms 52.502 ms]
                        thrpt:  [19.972 Melem/s 20.078 Melem/s 20.181 Melem/s]
                 change:
                        time:   [+1.6487% +2.1885% +2.8200%] (p = 0.00 < 0.05)
                        thrpt:  [−2.7426% −2.1417% −1.6220%]
                        Performance has regressed.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/alpha_premultiplied...
                        time:   [25.429 ms 25.503 ms 25.579 ms]
                        thrpt:  [40.994 Melem/s 41.116 Melem/s 41.236 Melem/s]
                 change:
                        time:   [−7.0166% −6.6146% −6.2160%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6280% +7.0831% +7.5461%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/alpha_triangles.jxl
                        time:   [52.805 ms 52.897 ms 52.992 ms]
                        thrpt:  [19.787 Melem/s 19.823 Melem/s 19.857 Melem/s]
                 change:
                        time:   [−0.1472% +0.2112% +0.5691%] (p = 0.25 > 0.05)
                        thrpt:  [−0.5659% −0.2108% +0.1474%]
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/animation_icos4d.jx...
                        time:   [34.806 ms 34.890 ms 34.984 ms]
                        thrpt:  [468.33 Kelem/s 469.59 Kelem/s 470.73 Kelem/s]
                 change:
                        time:   [−0.3543% +0.0263% +0.3775%] (p = 0.89 > 0.05)
                        thrpt:  [−0.3760% −0.0263% +0.3556%]
                        No change in performance detected.
Found 4 outliers among 50 measurements (8.00%)
  3 (6.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/animation_icos4d_5....
                        time:   [34.875 ms 34.970 ms 35.072 ms]
                        thrpt:  [467.16 Kelem/s 468.51 Kelem/s 469.80 Kelem/s]
                 change:
                        time:   [−0.2209% +0.1690% +0.5518%] (p = 0.39 > 0.05)
                        thrpt:  [−0.5488% −0.1687% +0.2214%]
                        No change in performance detected.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/animation_newtons_c...
                        time:   [90.815 ms 91.024 ms 91.249 ms]
                        thrpt:  [1.8937 Melem/s 1.8984 Melem/s 1.9028 Melem/s]
                 change:
                        time:   [−2.7090% −2.3681% −2.0313%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0735% +2.4255% +2.7844%]
                        Performance has improved.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/animation_spline.jx...
                        time:   [69.141 ms 69.276 ms 69.428 ms]
                        thrpt:  [1.4749 Melem/s 1.4781 Melem/s 1.4810 Melem/s]
                 change:
                        time:   [−2.7899% −2.5250% −2.2475%] (p = 0.00 < 0.05)
                        thrpt:  [+2.2992% +2.5904% +2.8699%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  3 (6.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/animation_spline_5....
                        time:   [69.403 ms 69.519 ms 69.642 ms]
                        thrpt:  [1.4704 Melem/s 1.4730 Melem/s 1.4754 Melem/s]
                 change:
                        time:   [−1.8227% −1.5512% −1.2956%] (p = 0.00 < 0.05)
                        thrpt:  [+1.3126% +1.5757% +1.8565%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/bench_oriented_brg....
                        time:   [6.2209 ms 6.2397 ms 6.2591 ms]
                        thrpt:  [48.410 Melem/s 48.560 Melem/s 48.707 Melem/s]
                 change:
                        time:   [+0.1460% +0.5269% +0.8705%] (p = 0.01 < 0.05)
                        thrpt:  [−0.8630% −0.5241% −0.1458%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/bench_oriented_brg_...
                        time:   [6.1893 ms 6.2083 ms 6.2304 ms]
                        thrpt:  [48.633 Melem/s 48.805 Melem/s 48.955 Melem/s]
                 change:
                        time:   [−0.3193% +0.0348% +0.4293%] (p = 0.86 > 0.05)
                        thrpt:  [−0.4274% −0.0348% +0.3204%]
                        No change in performance detected.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/bicycles.jxl
                        time:   [35.100 ms 35.203 ms 35.314 ms]
                        thrpt:  [18.297 Melem/s 18.355 Melem/s 18.409 Melem/s]
                 change:
                        time:   [−8.8965% −8.4474% −8.0361%] (p = 0.00 < 0.05)
                        thrpt:  [+8.7384% +9.2268% +9.7652%]
                        Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/bike.jxl
                        time:   [80.800 ms 81.021 ms 81.248 ms]
                        thrpt:  [64.530 Melem/s 64.710 Melem/s 64.887 Melem/s]
                 change:
                        time:   [−0.3025% +0.0415% +0.3722%] (p = 0.81 > 0.05)
                        thrpt:  [−0.3708% −0.0415% +0.3034%]
                        No change in performance detected.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/bike_5.jxl
                        time:   [80.869 ms 81.052 ms 81.238 ms]
                        thrpt:  [64.537 Melem/s 64.685 Melem/s 64.832 Melem/s]
                 change:
                        time:   [+0.1276% +0.4168% +0.6704%] (p = 0.00 < 0.05)
                        thrpt:  [−0.6659% −0.4151% −0.1274%]
                        Change within noise threshold.
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/blendmodes.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 13.2s, or reduce sample count to 10.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/blendmodes.jxl
                        time:   [263.10 ms 263.54 ms 263.98 ms]
                        thrpt:  [3.9722 Melem/s 3.9788 Melem/s 3.9855 Melem/s]
                 change:
                        time:   [−0.6825% −0.3243% +0.0017%] (p = 0.07 > 0.05)
                        thrpt:  [−0.0017% +0.3254% +0.6872%]
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/blendmodes_5.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 13.2s, or reduce sample count to 10.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/blendmodes_5.jxl
                        time:   [262.43 ms 263.02 ms 263.66 ms]
                        thrpt:  [3.9770 Melem/s 3.9866 Melem/s 3.9957 Melem/s]
                 change:
                        time:   [−0.6035% −0.2287% +0.1204%] (p = 0.24 > 0.05)
                        thrpt:  [−0.1202% +0.2293% +0.6071%]
                        No change in performance detected.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/cafe.jxl
                        time:   [19.274 ms 19.334 ms 19.396 ms]
                        thrpt:  [105.59 Melem/s 105.93 Melem/s 106.26 Melem/s]
                 change:
                        time:   [+0.7348% +1.0831% +1.4582%] (p = 0.00 < 0.05)
                        thrpt:  [−1.4373% −1.0715% −0.7294%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/cafe_5.jxl
                        time:   [19.129 ms 19.158 ms 19.188 ms]
                        thrpt:  [106.73 Melem/s 106.90 Melem/s 107.06 Melem/s]
                 change:
                        time:   [−0.2916% −0.0392% +0.1939%] (p = 0.77 > 0.05)
                        thrpt:  [−0.1936% +0.0392% +0.2924%]
                        No change in performance detected.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/cmyk_layers.jxl
                        time:   [20.414 ms 20.520 ms 20.634 ms]
                        thrpt:  [12.704 Melem/s 12.775 Melem/s 12.841 Melem/s]
                 change:
                        time:   [−2.0342% −1.4248% −0.7778%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7839% +1.4454% +2.0764%]
                        Change within noise threshold.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/delta_palette.jxl
                        time:   [29.541 ms 29.594 ms 29.651 ms]
                        thrpt:  [14.057 Melem/s 14.084 Melem/s 14.109 Melem/s]
                 change:
                        time:   [−0.8667% −0.5970% −0.3433%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3445% +0.6006% +0.8743%]
                        Change within noise threshold.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale.jxl
                        time:   [468.69 µs 469.75 µs 470.93 µs]
                        thrpt:  [84.939 Melem/s 85.151 Melem/s 85.344 Melem/s]
                 change:
                        time:   [−0.6613% −0.4148% −0.1458%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1460% +0.4166% +0.6657%]
                        Change within noise threshold.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale_5.jxl
                        time:   [467.64 µs 468.75 µs 469.94 µs]
                        thrpt:  [85.118 Melem/s 85.333 Melem/s 85.535 Melem/s]
                 change:
                        time:   [−1.4723% −1.1503% −0.8071%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8137% +1.1637% +1.4943%]
                        Change within noise threshold.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale_jpeg.jxl
                        time:   [363.75 µs 364.39 µs 365.05 µs]
                        thrpt:  [109.57 Melem/s 109.77 Melem/s 109.97 Melem/s]
                 change:
                        time:   [−5.3323% −5.1409% −4.9590%] (p = 0.00 < 0.05)
                        thrpt:  [+5.2177% +5.4195% +5.6327%]
                        Performance has improved.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale_jpeg_5.jx...
                        time:   [364.24 µs 364.92 µs 365.68 µs]
                        thrpt:  [109.38 Melem/s 109.61 Melem/s 109.82 Melem/s]
                 change:
                        time:   [−5.5166% −5.2848% −5.0547%] (p = 0.00 < 0.05)
                        thrpt:  [+5.3238% +5.5797% +5.8387%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale_public_un...: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 5.8s, or reduce sample count to 40.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/grayscale_public_un...
                        time:   [115.61 ms 115.84 ms 116.08 ms]
                        thrpt:  [40.192 Melem/s 40.277 Melem/s 40.357 Melem/s]
                 change:
                        time:   [−8.3236% −8.0690% −7.7984%] (p = 0.00 < 0.05)
                        thrpt:  [+8.4580% +8.7772% +9.0793%]
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/lossless_pfm.jxl
                        time:   [25.950 ms 26.017 ms 26.088 ms]
                        thrpt:  [9.5831 Melem/s 9.6091 Melem/s 9.6339 Melem/s]
                 change:
                        time:   [+3.2182% +3.5447% +3.8933%] (p = 0.00 < 0.05)
                        thrpt:  [−3.7474% −3.4233% −3.1179%]
                        Performance has regressed.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/lz77_flower.jxl
                        time:   [29.588 ms 29.651 ms 29.718 ms]
                        thrpt:  [6.8477 Melem/s 6.8630 Melem/s 6.8776 Melem/s]
                 change:
                        time:   [+1.2413% +1.4984% +1.7716%] (p = 0.00 < 0.05)
                        thrpt:  [−1.7407% −1.4762% −1.2261%]
                        Performance has regressed.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/noise.jxl
                        time:   [7.9990 ms 8.0151 ms 8.0330 ms]
                        thrpt:  [37.719 Melem/s 37.804 Melem/s 37.880 Melem/s]
                 change:
                        time:   [−0.2056% +0.0517% +0.3239%] (p = 0.72 > 0.05)
                        thrpt:  [−0.3229% −0.0517% +0.2060%]
                        No change in performance detected.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/noise_5.jxl
                        time:   [8.0055 ms 8.0182 ms 8.0327 ms]
                        thrpt:  [37.721 Melem/s 37.789 Melem/s 37.849 Melem/s]
                 change:
                        time:   [−0.1107% +0.1591% +0.4168%] (p = 0.25 > 0.05)
                        thrpt:  [−0.4151% −0.1588% +0.1109%]
                        No change in performance detected.
Found 4 outliers among 50 measurements (8.00%)
  3 (6.00%) high mild
  1 (2.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/opsin_inverse.jxl
                        time:   [5.4791 ms 5.4889 ms 5.4998 ms]
                        thrpt:  [55.093 Melem/s 55.202 Melem/s 55.301 Melem/s]
                 change:
                        time:   [+1.2143% +1.5193% +1.8161%] (p = 0.00 < 0.05)
                        thrpt:  [−1.7837% −1.4965% −1.1997%]
                        Performance has regressed.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/opsin_inverse_5.jxl
                        time:   [5.4625 ms 5.4726 ms 5.4828 ms]
                        thrpt:  [55.264 Melem/s 55.367 Melem/s 55.469 Melem/s]
                 change:
                        time:   [+1.0967% +1.3581% +1.6185%] (p = 0.00 < 0.05)
                        thrpt:  [−1.5927% −1.3399% −1.0848%]
                        Performance has regressed.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/patches.jxl
                        time:   [48.368 ms 48.469 ms 48.573 ms]
                        thrpt:  [36.102 Melem/s 36.180 Melem/s 36.256 Melem/s]
                 change:
                        time:   [−8.9361% −8.6972% −8.4618%] (p = 0.00 < 0.05)
                        thrpt:  [+9.2440% +9.5256% +9.8130%]
                        Performance has improved.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/patches_5.jxl
                        time:   [48.355 ms 48.514 ms 48.679 ms]
                        thrpt:  [36.024 Melem/s 36.146 Melem/s 36.265 Melem/s]
                 change:
                        time:   [−8.8216% −8.5388% −8.2209%] (p = 0.00 < 0.05)
                        thrpt:  [+8.9573% +9.3360% +9.6751%]
                        Performance has improved.
Found 8 outliers among 50 measurements (16.00%)
  2 (4.00%) low severe
  5 (10.00%) high mild
  1 (2.00%) high severe
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/progressive.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 8.2s, or reduce sample count to 30.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/progressive.jxl
                        time:   [162.89 ms 163.39 ms 163.94 ms]
                        thrpt:  [67.033 Melem/s 67.257 Melem/s 67.465 Melem/s]
                 change:
                        time:   [−1.7816% −1.4242% −1.0669%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0784% +1.4448% +1.8139%]
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/progressive_5.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 8.2s, or reduce sample count to 30.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/progressive_5.jxl
                        time:   [164.59 ms 165.08 ms 165.59 ms]
                        thrpt:  [66.362 Melem/s 66.569 Melem/s 66.767 Melem/s]
                 change:
                        time:   [−0.8551% −0.5322% −0.2008%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2012% +0.5351% +0.8625%]
                        Change within noise threshold.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/spot.jxl
                        time:   [75.914 ms 76.043 ms 76.176 ms]
                        thrpt:  [3.1506 Melem/s 3.1561 Melem/s 3.1615 Melem/s]
                 change:
                        time:   [−1.2701% −1.0756% −0.8624%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8699% +1.0873% +1.2864%]
                        Change within noise threshold.
Benchmarking decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/sunset_logo.jxl: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 10.6s, or reduce sample count to 20.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/sunset_logo.jxl
                        time:   [211.29 ms 211.72 ms 212.16 ms]
                        thrpt:  [6.0363 Melem/s 6.0488 Melem/s 6.0611 Melem/s]
                 change:
                        time:   [+0.0226% +0.2467% +0.4795%] (p = 0.04 < 0.05)
                        thrpt:  [−0.4773% −0.2461% −0.0226%]
                        Change within noise threshold.
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/upsampling.jxl
                        time:   [2.5173 ms 2.5234 ms 2.5302 ms]
                        thrpt:  [189.71 Melem/s 190.22 Melem/s 190.68 Melem/s]
                 change:
                        time:   [−1.2443% −0.8972% −0.5437%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5466% +0.9054% +1.2599%]
                        Change within noise threshold.
Found 9 outliers among 50 measurements (18.00%)
  6 (12.00%) high mild
  3 (6.00%) high severe
decode//home/tux/Documents/git/jxl-rs/jxl/resources/test/conformance_test_images/upsampling_5.jxl
                        time:   [2.5057 ms 2.5103 ms 2.5155 ms]
                        thrpt:  [190.81 Melem/s 191.21 Melem/s 191.56 Melem/s]
                 change:
                        time:   [−1.8194% −1.5141% −1.2418%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2575% +1.5374% +1.8531%]
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  3 (6.00%) high mild
  3 (6.00%) high severe

Top 5 / Bottom 5

Top 5

  1. squeeze_edge.jxl
    Time Change: −21.36%
    Throughput Change: +27.17%

  2. squeeze_alpha.jxl
    Time Change: −18.67%
    Throughput Change: +22.95%

  3. conformance_test_images/patches.jxl
    Time Change: −8.69%
    Throughput Change: +9.52%

  4. conformance_test_images/patches_5.jxl
    Time Change: −8.53%
    Throughput Change: +9.33%

  5. conformance_test_images/bicycles.jxl
    Time Change: −8.44%
    Throughput Change: +9.22%

Bottom 5

  1. stp2_520x260_d25_e6.jxl
    Time Change: +3.66%
    Throughput Change: −3.53%

  2. green_queen_modular_e3.jxl
    Time Change: +3.65%
    Throughput Change: −3.52%

  3. conformance_test_images/lossless_pfm.jxl
    Time Change: +3.54%
    Throughput Change: −3.42%

  4. conformance_test_images/alpha_nonpremultiplied.jxl
    Time Change: +2.18%
    Throughput Change: −2.14%

  5. lossy_with_icc.jxl
    Time Change: +2.08%
    Throughput Change: −2.04%

@github-actions
Copy link

github-actions bot commented Mar 14, 2026

Benchmark @ 0d7fe8d

MULTI-FILE BENCHMARK RESULTS (4 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (1.85). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: e883140e (Base) vs d9cb52d4 (PR)

File Base (MP/s) PR (MP/s) Δ%
bike.jxl 24.779 24.972 +0.78% ±1.3%
green_queen_modular_e3.jxl 7.936 7.888 -0.59% ±0.4%
green_queen_vardct_e3.jxl 24.175 24.370 +0.81% ±0.6%
sunset_logo.jxl 2.800 2.796 -0.14% ±0.3%

@veluca93
Copy link
Member

From a quick benchmark, I do not see nearly as significant performance gains as you do -- in particular I am not confident that the gains I see are above the noise threshold, at least on my testing pixel7a and on AMD Ryzen Threadripper 7970X 32-Cores and AMD Ryzen AI 9 HX 370.

In libjxl, we observed the branchy and branchless versions to be roughly equivalent in performance, with one of the two having a slight edge depending on the specific architecture.

One thing to be careful about is that the benchmark is very sensitive to CPU performance fluctuations - you should try to make sure that the system is set to performance mode, and ideally that as few other processes run on the system as possible, and that dynamic frequency scaling is disabled to get the most consistent results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants