[RISCV][VLOPT] Add support for vrgather#148249
Conversation
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
|
@llvm/pr-subscribers-backend-risc-v Author: Mikhail R. Gadelha (mikhailramalho) ChangesThis PR adds support for the vrgather.vi, vrgather.vx, vrgather.vv, vrgatherei16.vv instructions in the RISC-V VLOptimizer. To support vrgatherei16.vv I also needed to add support for it in getOperandLog2EEW. This is the third PR addressing the list at #147647. Full diff: https://github.com/llvm/llvm-project/pull/148249.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 2d9f38221d424..e940ab6c62637 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -747,6 +747,14 @@ getOperandLog2EEW(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
return TwoTimes ? MILog2SEW + 1 : MILog2SEW;
}
+ // Vector Register Gather with 16-bit Index Elements Instruction
+ // Dest and source data EEW=SEW. Index vector EEW=16.
+ case RISCV::VRGATHEREI16_VV: {
+ if (MO.getOperandNo() == 2)
+ return 4;
+ return MILog2SEW;
+ }
+
default:
return std::nullopt;
}
@@ -1051,6 +1059,11 @@ static bool isSupportedInstr(const MachineInstr &MI) {
case RISCV::VSLIDEDOWN_VI:
case RISCV::VSLIDE1UP_VX:
case RISCV::VFSLIDE1UP_VF:
+ // Vector Register Gather Instructions
+ case RISCV::VRGATHER_VI:
+ case RISCV::VRGATHER_VV:
+ case RISCV::VRGATHER_VX:
+ case RISCV::VRGATHEREI16_VV:
// Vector Single-Width Floating-Point Add/Subtract Instructions
case RISCV::VFADD_VF:
case RISCV::VFADD_VV:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll
index 317ad0c124e73..a8eba6d3db256 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-instrs.ll
@@ -5476,9 +5476,8 @@ define <vscale x 4 x i32> @vrgather_vi(<vscale x 4 x i32> %a, <vscale x 4 x i32>
;
; VLOPT-LABEL: vrgather_vi:
; VLOPT: # %bb.0:
-; VLOPT-NEXT: vsetvli a1, zero, e32, m2, ta, ma
-; VLOPT-NEXT: vrgather.vi v12, v8, 5
; VLOPT-NEXT: vsetvli zero, a0, e32, m2, ta, ma
+; VLOPT-NEXT: vrgather.vi v12, v8, 5
; VLOPT-NEXT: vadd.vv v8, v12, v10
; VLOPT-NEXT: ret
%1 = call <vscale x 4 x i32> @llvm.riscv.vrgather.vx.nxv4i32.iXLen(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, iXLen 5, iXLen -1)
@@ -5497,9 +5496,8 @@ define <vscale x 4 x i32> @vrgather_vv(<vscale x 4 x i32> %a, <vscale x 4 x i32>
;
; VLOPT-LABEL: vrgather_vv:
; VLOPT: # %bb.0:
-; VLOPT-NEXT: vsetvli a1, zero, e32, m2, ta, ma
-; VLOPT-NEXT: vrgather.vv v12, v8, v10
; VLOPT-NEXT: vsetvli zero, a0, e32, m2, ta, ma
+; VLOPT-NEXT: vrgather.vv v12, v8, v10
; VLOPT-NEXT: vadd.vv v8, v12, v8
; VLOPT-NEXT: ret
%1 = call <vscale x 4 x i32> @llvm.riscv.vrgather.vv.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, <vscale x 4 x i32> %idx, iXLen -1)
@@ -5518,9 +5516,8 @@ define <vscale x 4 x i32> @vrgather_vx(<vscale x 4 x i32> %a, iXLen %idx, <vscal
;
; VLOPT-LABEL: vrgather_vx:
; VLOPT: # %bb.0:
-; VLOPT-NEXT: vsetvli a2, zero, e32, m2, ta, ma
-; VLOPT-NEXT: vrgather.vx v12, v8, a0
; VLOPT-NEXT: vsetvli zero, a1, e32, m2, ta, ma
+; VLOPT-NEXT: vrgather.vx v12, v8, a0
; VLOPT-NEXT: vadd.vv v8, v12, v10
; VLOPT-NEXT: ret
%1 = call <vscale x 4 x i32> @llvm.riscv.vrgather.vx.nxv4i32.iXLen(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, iXLen %idx, iXLen -1)
@@ -5539,9 +5536,8 @@ define <vscale x 4 x i32> @vrgatherei16_vv(<vscale x 4 x i32> %a, <vscale x 4 x
;
; VLOPT-LABEL: vrgatherei16_vv:
; VLOPT: # %bb.0:
-; VLOPT-NEXT: vsetvli a1, zero, e32, m2, ta, ma
-; VLOPT-NEXT: vrgatherei16.vv v12, v8, v10
; VLOPT-NEXT: vsetvli zero, a0, e32, m2, ta, ma
+; VLOPT-NEXT: vrgatherei16.vv v12, v8, v10
; VLOPT-NEXT: vadd.vv v8, v12, v8
; VLOPT-NEXT: ret
%1 = call <vscale x 4 x i32> @llvm.riscv.vrgatherei16.vv.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, <vscale x 4 x i16> %idx, iXLen -1)
|
There was a problem hiding this comment.
vrgather has a nasty semantic cornercase where we can't reduce the VL of the data operand. We can reduce the VL of the vrgather itself, and of the index operand, but not the data operand because vrgather can read past VL. Do you have a test for that case? I didn't spot it immediately, but might have missed it.
Edit: It looks like the existing code handles this in an overly conservative way in canReadPastVL, so this is probably just locating an existing test.
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
This PR adds support for the vrgather.vi, vrgather.vx, vrgather.vv, vrgatherei16.vv instructions in the RISC-V VLOptimizer.
To support vrgatherei16.vv I also needed to add support for it in getOperandLog2EEW.
This is the third PR addressing the list at #147647.