Improve `has_constant_alpha*` methods by RunDevelopment · Pull Request #2941 · image-rs/image

RunDevelopment · 2026-05-07T15:46:18Z

Changes:

Use pixel.alpha() instead of *pixel.channels().last().unwrap()
Use bit-wise or to fill the accumulator instead of add.

awxkee

LGTM

Interesting idea, I haven't thought about that. Modern cpus with SIMD have the same latencies on ors and adds so I'd expect there is no difference if vectorized successfully, so I checked codegen out of curiosity.

Without avx2 both make garbage codegen where one does pointless vectorization whereas other doesn't https://godbolt.org/z/fhvqe3T7M. With avx2 https://godbolt.org/z/WTPY6qdqq it produces bit exact codegen where adds simply replaced with ors what is expected, since or and add have the same thoughtput and latencies there is no difference what is expected.
Both do something outstanding on aarch64 https://godbolt.org/z/f5MYcGeor, I don't want to benchmark actually but seems ors should be a bit better

RunDevelopment · 2026-05-10T11:40:51Z

Honestly, I didn't think about pref when making this. I wanted to use or instead of add, because the u64 could overflow with add. Super super super unlikely, but making that impossible just felt better.

Regarding code gen, yeah it's pretty bad. I think the root of all evil here is our memory access pattern. We're always reading 1 byte with 3 bytes in between. I think LLVM loses the information that it can safely access the bytes in between and so doesn't use this to optimize. If I make it load whole pixels and mask out everything but the alpha channel, code gen becomes a lot better.

This could probably be made even better by chunking some constant numbers of pixels (e.g. 128) and then handling the rest separately. Then the CPU could churn through all those pixels without branching. But I would expect what I linked above to be memory bound already, so that's probably not worth it. I didn't do any benchmark though, so I could be wrong.

since or and add have the same thoughtput and latencies there is no difference what is expected.

Oh, I didn't know that. TIL.

awxkee · 2026-05-10T11:58:02Z

Masking looks in a way it should be, on aarch64 as well. I wonder why LLVM autovectorizer is able to figure it out only with exactly avx2...

This could probably be made even better by chunking some constant numbers of pixels (e.g. 128) and then handling the rest separately. Then the CPU could churn through all those pixels without branching. But I would expect what I linked above to be memory bound already, so that's probably not worth it. I didn't do any benchmark though, so I could be wrong.

Yeah, I also think this will complicate the code with no measurable outcome.

RunDevelopment added 2 commits May 7, 2026 17:40

Improve has_constant_alpha* methods

cc2a4fa

rename accumulator

4c1fbc5

awxkee approved these changes May 9, 2026

View reviewed changes

RunDevelopment added 2 commits May 11, 2026 14:44

Merge branch 'main' into has_constant_alpha-impr

dc3d4dd

Better auto-vectorization

997d131

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `has_constant_alpha*` methods#2941

Improve `has_constant_alpha*` methods#2941
RunDevelopment wants to merge 4 commits into
image-rs:mainfrom
RunDevelopment:has_constant_alpha-impr

RunDevelopment commented May 7, 2026

Uh oh!

awxkee left a comment •

edited

Loading

Uh oh!

RunDevelopment commented May 10, 2026 •

edited

Loading

Uh oh!

awxkee commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RunDevelopment commented May 7, 2026

Uh oh!

awxkee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RunDevelopment commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awxkee commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awxkee left a comment •

edited

Loading

RunDevelopment commented May 10, 2026 •

edited

Loading