Skip to content

perf: tighten VarDCT group coefficient hot loop#720

Open
hjanuschka wants to merge 1 commit intolibjxl:mainfrom
hjanuschka:perf/pr14-vardct-group-hotloop
Open

perf: tighten VarDCT group coefficient hot loop#720
hjanuschka wants to merge 1 commit intolibjxl:mainfrom
hjanuschka:perf/pr14-vardct-group-hotloop

Conversation

@hjanuschka
Copy link
Collaborator

This removes unnecessary VarDCT scratch/transform zeroing and tightens coefficient writeback indexing in the hot decode loop. It keeps decode behavior unchanged while reducing redundant memory work. Unsafe is used only for permutation/current-coefficient indexing with bounds guaranteed by validated coefficient order and loop limits.

@github-actions
Copy link

Benchmark @ c1e4804

MULTI-FILE BENCHMARK RESULTS (4 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (2.27). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: e883140e (Base) vs e3b9da14 (PR)

File Base (MP/s) PR (MP/s) Δ%
bike.jxl 24.474 24.345 -0.53% ±2.9%
green_queen_modular_e3.jxl 7.918 7.816 -1.28% ±1.1%
green_queen_vardct_e3.jxl 23.883 24.346 +1.94% ±1.0%
sunset_logo.jxl 2.784 2.624 -5.71% ±0.6%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant