Skip to content

[Feature][NVIDIA] EPLB enablement for DSR1 disagg on GB200 #14661

@wenscarl

Description

@wenscarl

Checklist

Motivation

This issue tracks the ongoing work originating from NVIDIA.

  1. The current EPLB implementation is broken because the first dimension of the global scaling factor does not match num_local_expert. As a temporary workaround, see Fix EPLB + FP4 Quantization Compatibility Issue #13715. @shifangx
    @wenscarl
  2. NVIDIA has proposed a Linear Programming (LP)–based expert-parallelism algorithm. The integration will proceed in two stages:

a) Integrate the LP kernel into FlashInfer. @feliang-git

b) Integrate the FlashInfer operator into sglang.

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions