Skip to content

[Roadmap] Diffusion LLMs (2025 Q4 & 2026 Q1) #14199

@ClawSeven

Description

@ClawSeven

Checklist

Motivation

Earlier this year, LLaDA released the first diffusion LLM (dLLM), immediately capturing significant attention from both the academic and industrial communities. But there were no production-ready dLLM serving engine.

We plan to implement the most performant, production-ready dLLM framework in SGLang, make dLLM robust !

Features

For RL

VL-dLLM

  • Initial multi-modal LLM implementation @btw616

More supported models

More Hardwares

More Parallelism

  • Tensor parallelism
  • Expert parallelism
  • Data Parallelism (with DPA)
  • Context Parallelism
  • Pipeline parallelism

Kernel Optimization for dLLM

More disaggregation

PD is not suitable for dLLM, but AFD might be a viable option.

More Tests

  • Small unit tests for specific functions
  • Nightly unit tests for E2E accuracy and throughput testing

Better streaming output

  • Support diffusion-style streaming output (like Mercury)

RFC

#12766

Related resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions