[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic by krzysz00 · Pull Request #23948 · iree-org/iree

krzysz00 · 2026-03-27T16:09:41Z

Integer range analysis now handles narrowing to i32 where safe, making the --iree-rocm-index-bits option (which lowered all ROCDL indices to 32-bit) obsolete. Remove it so the ROCDL path matches NVVM (which always has 64-bit indices at the LLVM conversion level).

Add an indexIsI64 option to OptimizeIntArithmeticPass that relaxes the SAFE_INDEX_UNSIGNED_MAX_VALUE guard on signed-to-unsigned conversions for index values. On LLVMGPU targets where index is always 64-bit, this guard is unnecessarily conservative and blocks valid optimizations. For-loop IV narrowing (NarrowSCFForIvToI32 retains its own range checks unconditionally.)

Performance impact: on whole models, within the noise floor (as expected, this killed off a few instructions) but there is a consistent minor trend on the torch_models CI that gives a 1.01x geometric mean speedup, so there's not much reason not to do this. Table below.

krzysz00 · 2026-03-27T16:10:45Z

Benchmark	Baseline (ms)	Test (ms)	Speedup
llama_8b_fp16/decode_benchmark_seq128_mi325	7.638	7.468	1.02x
llama_8b_fp16/decode_benchmark_seq2048_mi325	9.076	8.915	1.02x
llama_8b_fp16/prefill_benchmark_seq128_mi325	31.835	31.821	1.00x
llama_8b_fp16/prefill_benchmark_seq2048_mi325	279.081	277.750	1.00x
llama_8b_fp8/decode_benchmark_seq128_mi325	8.219	7.986	1.03x
llama_8b_fp8/decode_benchmark_seq128_mi325_data_tiling	17.252	17.244	1.00x
llama_8b_fp8/decode_benchmark_seq2048_mi325	11.054	11.034	1.00x
llama_8b_fp8/decode_benchmark_seq2048_mi325_data_tiling	19.907	20.085	0.99x
llama_8b_fp8/prefill_benchmark_seq128_mi325	25.691	25.748	1.00x
llama_8b_fp8/prefill_benchmark_seq128_mi325_data_tiling	24.886	24.987	1.00x
llama_8b_fp8/prefill_benchmark_seq2048_mi325	180.207	180.133	1.00x
llama_8b_fp8/prefill_benchmark_seq2048_mi325_data_tiling	197.122	196.607	1.00x
sdxl/clip_benchmark_mi325	7.266	7.215	1.01x
sdxl/punet_benchmark_mi325	46.146	46.054	1.00x
sdxl/punet_benchmark_mi325_v2	43.660	43.507	1.00x

…metic Integer range analysis now handles narrowing to i32 where safe, making the --iree-rocm-index-bits option (which lowered all ROCDL indices to 32-bit) obsolete. Remove it so the ROCDL path matches NVVM (which always has 64-bit indices at the LLVM conversion level). Add an indexIsI64 option to OptimizeIntArithmeticPass that relaxes the SAFE_INDEX_UNSIGNED_MAX_VALUE guard on signed-to-unsigned conversions for index values. On LLVMGPU targets where index is always 64-bit, this guard is unnecessarily conservative and blocks valid optimizations. For-loop IV narrowing (NarrowSCFForIvToI32 retains its own range checks unconditionally.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nirvedhmeshram · 2026-04-01T15:42:39Z

compiler/src/iree/compiler/Dialect/Util/Transforms/OptimizeIntArithmetic.cpp

    }
-    if (!staticallyLegalToConvertToUnsigned(solver, iv)) {
+    if (!staticallyLegalToConvertToUnsigned(solver, iv,
+                                            /*indexIsI64=*/false)) {


Here and the other use below shouldnt you plumb through the pass option indexIsI64 rather than hard-coding false? Also maybe the test checks need to be improved if they didnt catch this?

This is a weird one - it is specifically meant to handle narrowing to i32, so we do want the safety check

nirvedhmeshram

LGTM

krzysz00 requested review from Groverkss, Max191, benvanik, kuhar, nirvedhmeshram and qedawkins as code owners March 27, 2026 16:09

krzysz00 force-pushed the index-i64-rocm branch from d7e3f28 to 8406542 Compare March 31, 2026 01:43

nirvedhmeshram reviewed Apr 1, 2026

View reviewed changes

nirvedhmeshram approved these changes Apr 3, 2026

View reviewed changes

Add clarifying comment

68d4d87

krzysz00 merged commit 175fae3 into iree-org:main Apr 4, 2026
61 of 63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic#23948

[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic#23948
krzysz00 merged 2 commits intoiree-org:mainfrom
krzysz00:index-i64-rocm

krzysz00 commented Mar 27, 2026

Uh oh!

krzysz00 commented Mar 27, 2026

Uh oh!

nirvedhmeshram Apr 1, 2026

Uh oh!

krzysz00 Apr 1, 2026

Uh oh!

nirvedhmeshram left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krzysz00 commented Mar 27, 2026

Uh oh!

krzysz00 commented Mar 27, 2026

Uh oh!

nirvedhmeshram Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

nirvedhmeshram left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants