Which component requires the feature?
CuTe DSL
Feature Request
Dear developers,
Could we offer a cuteDSL flash attention kernel for sm120 (Blackwell_geforce).
I saw the cuteDSL flash attention kernel for ampere has already been developed, and I thought it will not be very hard for sm120 support (add TMA).
It is useful because there are few high-performance FA kernel for sm120