[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054
[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
6c197cf to
38f98e3
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the Mixture-of-Experts (MoE) layer by reorganizing the deepgemm runner logic. The core change is deprecating forward_deepgemm_contiguous and forward_deepgemm_masked from DeepEPMoE and moving their functionality into a more modular, runner-based architecture within deep_gemm.py. This is a solid architectural improvement that enhances code clarity and maintainability. The introduction of CombineInput wrappers and renaming of DispatchOutput classes also contribute to better code readability. I've identified a couple of potential memory leaks due to missing tensor disposals and a minor point about using assert False for deprecation warnings. Overall, this is a great step towards a cleaner MoE implementation.
412a6c4 to
ed71fae
Compare
Motivation
Deprecate
forward_deepgemm_contiguousandforward_deepgemm_masked. Finalizedeep_gemmbackend.Modifications
Accuracy Tests
Benchmarking and Profiling
Server:
Benchmark:
Token throughputs are both around 2120.
Checklist