Hello! I'm interested in cumm and find it's very fast. I read cumm's code and found it seems that cumm provides a gemm implement from scratch while also providing a cutlass mode. And cumm will use its own gemm by default. I do some experiments, and I find that for some shapes, the cutlass will get faster, and for others, cumm's gemm will be faster. Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on the input shapes?