Skip to content

[mlir][hoisting] Currently linalg hoisting can not optimize memref.assume_alignment #144825

@xiangzh1

Description

@xiangzh1

Last month (2025-05-18)Kleiman update the AssumeAlignmentOp, let it has AnyMemRef result。
this make the following changes:

%2 = hal.interface.binding.subspan layout ... : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
 memref.assume_alignment %2, 64 : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
use %2

change to

%2 = hal.interface.binding.subspan layout ... : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
%assume_align = memref.assume_alignment %2, 64 : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
use %assume_align

Problem:
This will affect the linalg hoisting optimization,due to the memref.assume_alignment inherited the interface
ViewLikeOpInterface which is excluded by linalg hoisting.

for example , in follow mlir, the
"%1 = vector.transfer_read %assume_align_0[%c0, %c0] ..." and
"vector.transfer_write %3, %assume_align_0[%c0, %c0]"
read from and write to a same location. We can hoist them out of loop:

%m0 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
 %m1 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
 %assume_align_0 = memref.assume_alignment %m0, 64 : memref<4096x4096xf16>
 %assume_align_1 = memref.assume_alignment %m1, 64 : memref<4096x4096xf16>
 scf.for %arg0 = %c256 to %c4096 step %c256 {
   %1 = vector.transfer_read %assume_align_0[%c0, %c0], %cst_0 {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
   %2 = vector.transfer_read %m1[%arg0, %arg0], %cst_0 {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
   %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %2, %2, %1 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
   vector.transfer_write %3, %assume_align_0[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<4096x4096xf16>
 }

but due to the transfer_read/write from/to an assume_alignment operation. The linalg hoisting stop do optimization for it.
(I am not much understand why the linalg hoisting do this, I am a beginner in mlir)
But the assume_alignment just mark memref's alignment, The linalg hoisting should check its memref operand not it self.
so we expect the upper mlir can be optimized to:

   %m0 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
    %m1 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
    %assume_align_0 = memref.assume_alignment %m0, 64 : memref<4096x4096xf16>
    %assume_align_1 = memref.assume_alignment %m1, 64 : memref<4096x4096xf16>
    %0 = vector.transfer_read %assume_align[%c0, %c0], %cst {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16> // out of loop
    %1 = scf.for %arg0 = %c256 to %c4096 step %c256 iter_args(%arg1 = %0) -> (vector<16x16xf16>) {
      %2 = vector.transfer_read %assume_align_1[%arg0, %arg0], %cst {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
      %3 = vector.contract {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %2, %2, %arg1 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
      scf.yield %3 : vector<16x16xf16>
    }
    vector.transfer_write %1, %assume_align[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<4096x4096xf16> // out of loop

detailed example pls refer to example
(I don't not how to write hal.interface.binding for mlir-opt so, in the example I use memref.alloc() instead of them.)

Metadata

Metadata

Assignees

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions