Skip to content

Comments

add training engine#2

Merged
zhtmike merged 3 commits intoverl-omnifrom
train
Jan 7, 2026
Merged

add training engine#2
zhtmike merged 3 commits intoverl-omnifrom
train

Conversation

@zhtmike
Copy link
Owner

@zhtmike zhtmike commented Jan 7, 2026

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a training engine implementation for diffusion models, specifically supporting Qwen Image models with FSDP-based distributed training and Flow Matching with Stochastic Differential Equations (SDE) schedulers.

Key Changes:

  • Implements DiffusersFSDPEngine with support for LoRA, gradient checkpointing, and mixed precision training
  • Adds custom SDE scheduler (FlowMatchSDEDiscreteScheduler) with log probability computation for reinforcement learning
  • Introduces modified Qwen Image pipeline with log probability tracking for training workflows

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
verl/workers/engine/fsdp/diffuser_impl.py Main FSDP engine implementation for diffusion models with 748 lines of model initialization, training, and checkpoint management
verl/workers/engine/diffusers/utils.py Utility functions for preparing and loading diffusion pipelines to devices
verl/workers/engine/diffusers/schedulers/scheduling_flow_match_sde_discrete.py Custom scheduler extending FlowMatchEulerDiscreteScheduler with SDE support and log probability calculation
verl/workers/engine/diffusers/schedulers/__init__.py Module initialization for custom schedulers
verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py Extended QwenImagePipeline with log probability tracking for RL training
verl/workers/engine/diffusers/pipelines/__init__.py Module initialization for custom pipelines
verl/workers/engine/diffusers/patch.py Utility to inject SDE scheduler into existing pipelines
verl/workers/engine/diffusers/__init__.py Main module initialization for diffusers engine components
verl/trainer/ppo/ray_diffusion_trainer.py Updates copyright header and removes unused parameter
verl/trainer/main_flowgrpo.py Updates copyright header

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sde_window_range (`tuple[int, int]`, *optional*, defaults to (0, 5)):
The range of the SDE window start index. Only used if `sde_window_size` is provided.
sde_type (`str`, *optional*, defaults to "sde"):
The type of SDE to use. Choose between "sde"
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete documentation in docstring. The description for "sde_type" parameter ends abruptly with "Choose between 'sde'" without completing the sentence or listing the available options.

Suggested change
The type of SDE to use. Choose between "sde"
The type of SDE to use, e.g. `"sde"`.

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +372
attention_kwargs=self.attention_kwargs,
return_dict=False,
)[0]

if do_true_cfg:
with self.transformer.cache_context("uncond"):
neg_noise_pred = self.transformer(
hidden_states=latents,
timestep=timestep / 1000,
guidance=guidance,
encoder_hidden_states_mask=negative_prompt_embeds_mask,
encoder_hidden_states=negative_prompt_embeds,
img_shapes=img_shapes,
txt_seq_lens=negative_txt_seq_lens,
attention_kwargs=self.attention_kwargs,
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent attribute reference. Lines 358 and 372 reference 'self.attention_kwargs' but the attribute is set as 'self._attention_kwargs' on line 211. This mismatch will cause an AttributeError.

Copilot uses AI. Check for mistakes.
@EngineRegistry.register(model_type="diffusion_model", backend=["fsdp", "fsdp2"], device=["cuda", "npu"])
class DiffusersFSDPEngine(BaseEngine):
"""
Concrete Diffussers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the docstring: "Diffussers" should be "Diffusers".

Suggested change
Concrete Diffussers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).
Concrete Diffusers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).

Copilot uses AI. Check for mistakes.
prev_sample (`torch.FloatTensor`, *optional*):
The sample from the previous timestep. If not provided, it will be sampled inside the function.
sde_type (`str`, *optional*, defaults to "sde"):
The type of SDE to use. Choose between "sde" and "cps
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete documentation in docstring. The description for the "sde_type" parameter is cut off at "Choose between 'sde' and 'cps" - it should end with a closing quote and potentially additional information about which option to choose or their differences.

Suggested change
The type of SDE to use. Choose between "sde" and "cps
The type of SDE to use. Choose between "sde" and "cps".

Copilot uses AI. Check for mistakes.
else:
XLA_AVAILABLE = False

logger = logging.get_logger(__name__)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing function call for 'get_logger'. The code uses 'logging.get_logger(name)' but should use 'logging.getLogger(name)' instead. The correct method is 'getLogger' not 'get_logger'.

Suggested change
logger = logging.get_logger(__name__)
logger = logging.getLogger(__name__)

Copilot uses AI. Check for mistakes.
elif not self.transformer.config.guidance_embeds and guidance_scale is None:
guidance = None

if self.attention_kwargs is None:
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent attribute usage. The code sets 'self._attention_kwargs' on line 211 but checks 'self.attention_kwargs' (without underscore) on line 306. This will cause an AttributeError since the attribute name doesn't match.

Suggested change
if self.attention_kwargs is None:
if self._attention_kwargs is None:

Copilot uses AI. Check for mistakes.

# Load base model with specified configuration and dtype
module = self._build_module()
scheduler = self._build_scheduer()
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in method name: "_build_scheduer" should be "_build_scheduler". This will cause an AttributeError when called on line 401.

Suggested change
scheduler = self._build_scheduer()
scheduler = self._build_scheduler()

Copilot uses AI. Check for mistakes.
@zhtmike zhtmike merged commit 298c556 into verl-omni Jan 7, 2026
2 of 4 checks passed
@zhtmike zhtmike deleted the train branch January 7, 2026 07:37
zhtmike added a commit that referenced this pull request Jan 9, 2026
* add training engine

* fix init

* fix typs
zhtmike added a commit that referenced this pull request Jan 9, 2026
* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* init dataset for Qwen-Image

* pass UT

* update return, update UT

* pass UT

* align with rl_dataset

* pass UT

* update filter long prompts

* debug

* clean code

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
zhtmike added a commit that referenced this pull request Jan 26, 2026
* add training engine

* fix init

* fix typs
zhtmike added a commit that referenced this pull request Jan 26, 2026
* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* init dataset for Qwen-Image

* pass UT

* update return, update UT

* pass UT

* align with rl_dataset

* pass UT

* update filter long prompts

* debug

* clean code

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
zhtmike added a commit that referenced this pull request Jan 27, 2026
* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* Update 20260109 (#8)

* Update 20260109

* update

* fix CI

* [data] feat: Add dataset for Qwen-Image (#6)

* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* init dataset for Qwen-Image

* pass UT

* update return, update UT

* pass UT

* align with rl_dataset

* pass UT

* update filter long prompts

* debug

* clean code

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

* add new config; debug actor

* debug; add reward config; add adv, policy loss

* debug reward loop

* init diffusers engine UT

* debug

* debug

* deubg actor forward

* debug

* merge

* add UT for adv and loss

* pass adv&loss UTs; pass engine backward UT

* clean debug code

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
zhtmike added a commit that referenced this pull request Jan 29, 2026
* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* Update 20260109 (#8)

* Update 20260109

* update

* fix CI

* [data] feat: Add dataset for Qwen-Image (#6)

* add entroypoint (#1)

* add training engine (#2)

* add training engine

* fix init

* fix typs

* move folders & make for two-forward pass in training loop (#4)

* Add diffusion reward loop (#3)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* [fix] update customized reward func in UT (#5)

* init reward; add ocr reward

* update disrm input

* add unit test

* pass ut

* fix typos/bugs

* update copyright

* update customized reward_fn

* init dataset for Qwen-Image

* pass UT

* update return, update UT

* pass UT

* align with rl_dataset

* pass UT

* update filter long prompts

* debug

* clean code

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

* update to align verl data format

* debug

---------

Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant