add training engine by zhtmike · Pull Request #2 · zhtmike/verl

zhtmike · 2026-01-07T07:23:38Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Copilot

Pull request overview

This PR adds a training engine implementation for diffusion models, specifically supporting Qwen Image models with FSDP-based distributed training and Flow Matching with Stochastic Differential Equations (SDE) schedulers.

Key Changes:

Implements DiffusersFSDPEngine with support for LoRA, gradient checkpointing, and mixed precision training
Adds custom SDE scheduler (FlowMatchSDEDiscreteScheduler) with log probability computation for reinforcement learning
Introduces modified Qwen Image pipeline with log probability tracking for training workflows

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`verl/workers/engine/fsdp/diffuser_impl.py`	Main FSDP engine implementation for diffusion models with 748 lines of model initialization, training, and checkpoint management
`verl/workers/engine/diffusers/utils.py`	Utility functions for preparing and loading diffusion pipelines to devices
`verl/workers/engine/diffusers/schedulers/scheduling_flow_match_sde_discrete.py`	Custom scheduler extending FlowMatchEulerDiscreteScheduler with SDE support and log probability calculation
`verl/workers/engine/diffusers/schedulers/__init__.py`	Module initialization for custom schedulers
`verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py`	Extended QwenImagePipeline with log probability tracking for RL training
`verl/workers/engine/diffusers/pipelines/__init__.py`	Module initialization for custom pipelines
`verl/workers/engine/diffusers/patch.py`	Utility to inject SDE scheduler into existing pipelines
`verl/workers/engine/diffusers/__init__.py`	Main module initialization for diffusers engine components
`verl/trainer/ppo/ray_diffusion_trainer.py`	Updates copyright header and removes unused parameter
`verl/trainer/main_flowgrpo.py`	Updates copyright header

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T07:27:21Z

verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py

+            sde_window_range (`tuple[int, int]`, *optional*, defaults to (0, 5)):
+                The range of the SDE window start index. Only used if `sde_window_size` is provided.
+            sde_type (`str`, *optional*, defaults to "sde"):
+                The type of SDE to use. Choose between "sde"


Incomplete documentation in docstring. The description for "sde_type" parameter ends abruptly with "Choose between 'sde'" without completing the sentence or listing the available options.

Suggested change

The type of SDE to use. Choose between "sde"

The type of SDE to use, e.g. `"sde"`.

Copilot · 2026-01-07T07:27:22Z

verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py

+                        attention_kwargs=self.attention_kwargs,
+                        return_dict=False,
+                    )[0]
+
+                if do_true_cfg:
+                    with self.transformer.cache_context("uncond"):
+                        neg_noise_pred = self.transformer(
+                            hidden_states=latents,
+                            timestep=timestep / 1000,
+                            guidance=guidance,
+                            encoder_hidden_states_mask=negative_prompt_embeds_mask,
+                            encoder_hidden_states=negative_prompt_embeds,
+                            img_shapes=img_shapes,
+                            txt_seq_lens=negative_txt_seq_lens,
+                            attention_kwargs=self.attention_kwargs,


Inconsistent attribute reference. Lines 358 and 372 reference 'self.attention_kwargs' but the attribute is set as 'self._attention_kwargs' on line 211. This mismatch will cause an AttributeError.

Copilot · 2026-01-07T07:27:22Z

verl/workers/engine/fsdp/diffuser_impl.py

+@EngineRegistry.register(model_type="diffusion_model", backend=["fsdp", "fsdp2"], device=["cuda", "npu"])
+class DiffusersFSDPEngine(BaseEngine):
+    """
+    Concrete Diffussers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).


Typo in the docstring: "Diffussers" should be "Diffusers".

Suggested change

Concrete Diffussers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).

Concrete Diffusers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).

Copilot · 2026-01-07T07:27:22Z

verl/workers/engine/diffusers/schedulers/scheduling_flow_match_sde_discrete.py

+            prev_sample (`torch.FloatTensor`, *optional*):
+                The sample from the previous timestep. If not provided, it will be sampled inside the function.
+            sde_type (`str`, *optional*, defaults to "sde"):
+                The type of SDE to use. Choose between "sde" and "cps


Incomplete documentation in docstring. The description for the "sde_type" parameter is cut off at "Choose between 'sde' and 'cps" - it should end with a closing quote and potentially additional information about which option to choose or their differences.

Suggested change

The type of SDE to use. Choose between "sde" and "cps

The type of SDE to use. Choose between "sde" and "cps".

Copilot · 2026-01-07T07:27:22Z

verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py

+else:
+    XLA_AVAILABLE = False
+
+logger = logging.get_logger(__name__)


Missing function call for 'get_logger'. The code uses 'logging.get_logger(name)' but should use 'logging.getLogger(name)' instead. The correct method is 'getLogger' not 'get_logger'.

Suggested change

logger = logging.get_logger(__name__)

logger = logging.getLogger(__name__)

Copilot · 2026-01-07T07:27:23Z

verl/workers/engine/diffusers/pipelines/pipeline_qwenimage.py

+        elif not self.transformer.config.guidance_embeds and guidance_scale is None:
+            guidance = None
+
+        if self.attention_kwargs is None:


Inconsistent attribute usage. The code sets 'self._attention_kwargs' on line 211 but checks 'self.attention_kwargs' (without underscore) on line 306. This will cause an AttributeError since the attribute name doesn't match.

Suggested change

if self.attention_kwargs is None:

if self._attention_kwargs is None:

Copilot · 2026-01-07T07:27:23Z

verl/workers/engine/fsdp/diffuser_impl.py

+
+        # Load base model with specified configuration and dtype
+        module = self._build_module()
+        scheduler = self._build_scheduer()


Typo in method name: "_build_scheduer" should be "_build_scheduler". This will cause an AttributeError when called on line 401.

Suggested change

scheduler = self._build_scheduer()

scheduler = self._build_scheduler()

* add training engine * fix init * fix typs

* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

* add training engine * fix init * fix typs

* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * Update 20260109 (#8) * Update 20260109 * update * fix CI * [data] feat: Add dataset for Qwen-Image (#6) * add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com> * add new config; debug actor * debug; add reward config; add adv, policy loss * debug reward loop * init diffusers engine UT * debug * debug * deubg actor forward * debug * merge * add UT for adv and loss * pass adv&loss UTs; pass engine backward UT * clean debug code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * Update 20260109 (#8) * Update 20260109 * update * fix CI * [data] feat: Add dataset for Qwen-Image (#6) * add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com> * update to align verl data format * debug --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>

add training engine

e6c02c7

zhtmike requested a review from Copilot January 7, 2026 07:24

Copilot started reviewing on behalf of zhtmike January 7, 2026 07:24 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

zhtmike added 2 commits January 7, 2026 15:28

fix init

595ac29

fix typs

5e30c30

zhtmike merged commit 298c556 into verl-omni Jan 7, 2026
2 of 4 checks passed

zhtmike deleted the train branch January 7, 2026 07:37

zhtmike added a commit that referenced this pull request Jan 9, 2026

add training engine (#2)

c3b775e

* add training engine * fix init * fix typs

zhtmike added a commit that referenced this pull request Jan 26, 2026

add training engine (#2)

62c5286

* add training engine * fix init * fix typs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

add training engine#2

add training engine#2
zhtmike merged 3 commits intoverl-omnifrom
train

zhtmike commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	The type of SDE to use. Choose between "sde"
	The type of SDE to use, e.g. `"sde"`.

	Concrete Diffussers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).
	Concrete Diffusers Engine implementation using PyTorch FullyShardedDataParallel (FSDP).

	logger = logging.get_logger(__name__)
	logger = logging.getLogger(__name__)

	if self.attention_kwargs is None:
	if self._attention_kwargs is None:

	scheduler = self._build_scheduer()
	scheduler = self._build_scheduler()

Comments

Conversation

zhtmike commented Jan 7, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant