[X86][Inline] Don't inline callee with cx16 if caller is without cx16#187505
[X86][Inline] Don't inline callee with cx16 if caller is without cx16#187505
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-x86 Author: None (usamoi) Changescloses #187503 Full diff: https://github.com/llvm/llvm-project/pull/187505.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index b3dde1555d0a0..0acf37939e2df 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -43,7 +43,6 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> {
// These features don't have any intrinsics or ABI effect.
X86::FeatureNOPL,
- X86::FeatureCX16,
X86::FeatureLAHFSAHF64,
// Some older targets can be setup to fold unaligned loads.
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h -- llvm/lib/Target/X86/X86TargetTransformInfo.h --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index 0acf37939..9cb81d38b 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -42,71 +42,43 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> {
X86::FeatureX86_64,
// These features don't have any intrinsics or ABI effect.
- X86::FeatureNOPL,
- X86::FeatureLAHFSAHF64,
+ X86::FeatureNOPL, X86::FeatureLAHFSAHF64,
// Some older targets can be setup to fold unaligned loads.
X86::FeatureSSEUnalignedMem,
// Codegen control options.
- X86::TuningFast11ByteNOP,
- X86::TuningFast15ByteNOP,
- X86::TuningFastBEXTR,
- X86::TuningFastHorizontalOps,
- X86::TuningFastLZCNT,
- X86::TuningFastScalarFSQRT,
- X86::TuningFastSHLDRotate,
- X86::TuningFastScalarShiftMasks,
- X86::TuningFastVectorShiftMasks,
+ X86::TuningFast11ByteNOP, X86::TuningFast15ByteNOP, X86::TuningFastBEXTR,
+ X86::TuningFastHorizontalOps, X86::TuningFastLZCNT,
+ X86::TuningFastScalarFSQRT, X86::TuningFastSHLDRotate,
+ X86::TuningFastScalarShiftMasks, X86::TuningFastVectorShiftMasks,
X86::TuningFastVariableCrossLaneShuffle,
- X86::TuningFastVariablePerLaneShuffle,
- X86::TuningFastVectorFSQRT,
- X86::TuningLEAForSP,
- X86::TuningLEAUsesAG,
- X86::TuningLZCNTFalseDeps,
- X86::TuningBranchFusion,
- X86::TuningMacroFusion,
- X86::TuningPadShortFunctions,
- X86::TuningPOPCNTFalseDeps,
- X86::TuningMULCFalseDeps,
- X86::TuningPERMFalseDeps,
- X86::TuningRANGEFalseDeps,
- X86::TuningGETMANTFalseDeps,
- X86::TuningMULLQFalseDeps,
- X86::TuningSlow3OpsLEA,
- X86::TuningSlowDivide32,
- X86::TuningSlowDivide64,
- X86::TuningSlowIncDec,
- X86::TuningSlowLEA,
- X86::TuningSlowPMADDWD,
- X86::TuningSlowPMULLD,
- X86::TuningSlowSHLD,
- X86::TuningSlowTwoMemOps,
- X86::TuningSlowUAMem16,
- X86::TuningPreferMaskRegisters,
- X86::TuningInsertVZEROUPPER,
- X86::TuningUseSLMArithCosts,
- X86::TuningUseGLMDivSqrtCosts,
- X86::TuningNoDomainDelay,
- X86::TuningNoDomainDelayMov,
- X86::TuningNoDomainDelayShuffle,
- X86::TuningNoDomainDelayBlend,
- X86::TuningPreferShiftShuffle,
- X86::TuningFastImmVectorShift,
+ X86::TuningFastVariablePerLaneShuffle, X86::TuningFastVectorFSQRT,
+ X86::TuningLEAForSP, X86::TuningLEAUsesAG, X86::TuningLZCNTFalseDeps,
+ X86::TuningBranchFusion, X86::TuningMacroFusion,
+ X86::TuningPadShortFunctions, X86::TuningPOPCNTFalseDeps,
+ X86::TuningMULCFalseDeps, X86::TuningPERMFalseDeps,
+ X86::TuningRANGEFalseDeps, X86::TuningGETMANTFalseDeps,
+ X86::TuningMULLQFalseDeps, X86::TuningSlow3OpsLEA,
+ X86::TuningSlowDivide32, X86::TuningSlowDivide64, X86::TuningSlowIncDec,
+ X86::TuningSlowLEA, X86::TuningSlowPMADDWD, X86::TuningSlowPMULLD,
+ X86::TuningSlowSHLD, X86::TuningSlowTwoMemOps, X86::TuningSlowUAMem16,
+ X86::TuningPreferMaskRegisters, X86::TuningInsertVZEROUPPER,
+ X86::TuningUseSLMArithCosts, X86::TuningUseGLMDivSqrtCosts,
+ X86::TuningNoDomainDelay, X86::TuningNoDomainDelayMov,
+ X86::TuningNoDomainDelayShuffle, X86::TuningNoDomainDelayBlend,
+ X86::TuningPreferShiftShuffle, X86::TuningFastImmVectorShift,
X86::TuningFastDPWSSD,
// Perf-tuning flags.
- X86::TuningFastGather,
- X86::TuningSlowUAMem32,
+ X86::TuningFastGather, X86::TuningSlowUAMem32,
X86::TuningAllowLight256Bit,
// Based on whether user set the -mprefer-vector-width command line.
- X86::TuningPrefer128Bit,
- X86::TuningPrefer256Bit,
+ X86::TuningPrefer128Bit, X86::TuningPrefer256Bit,
// CPU name enums. These just follow CPU string.
- X86::ProcIntelAtom
- };
+ X86::ProcIntelAtom};
public:
explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)
|
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
nikic
left a comment
There was a problem hiding this comment.
I agree that ignoring cx16 during inlining is invalid. Whether or not libatomic is used for an access is an ABI effect.
This does seem to defeat the original intent behind those tests though.
closes #187503