Skip to content

[Feature] Tune fp8 Gemm and fused moe kernel on B200 #6095

@Fridge003

Description

@Fridge003

Checklist

Motivation

The performance of w8a8 gemm kernel and fused moe kernel is not good enough on B200. There is some space for tuning.

Related resources

Reproduction on 8*B200:

python3 -m sglang.bench_one_batch --model-path /dev/shm/DeepSeek-V3 --tp 8 --batch 16 --input-len 1024 --output-len 128 --attention-backend triton --profile

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions