Skip to content

[veomni] feat: enable VeOmni engine for on-policy distillation#6072

Merged
wuxibin89 merged 2 commits intoverl-project:mainfrom
hjshi84:feat/veomni-opd-support
Apr 21, 2026
Merged

[veomni] feat: enable VeOmni engine for on-policy distillation#6072
wuxibin89 merged 2 commits intoverl-project:mainfrom
hjshi84:feat/veomni-opd-support

Conversation

@hjshi84
Copy link
Copy Markdown
Contributor

@hjshi84 hjshi84 commented Apr 20, 2026

What does this PR do?

Add VeOmni strategy support to the distillation and separation worker components so that VeOmniEngine can be used as a training backend in on-policy distillation (OPD) scenarios.

Design & Code Changes

  1. verl/trainer/distillation/losses.py: Match 'veomni' alongside 'fsdp' in compute_topk_loss() so VeOmni can reuse FSDP's topk forward KL loss implementation. VeOmni inherits from FSDP2 and shares the same loss computation logic.

  2. verl/experimental/separation/engine_workers.py: Add 'veomni' to DetachActorWorker's save/restore strategy handlers (fsdp2 sharded save/load), enabling model offload in separation mode. Also pass **kwargs through __init__ to parent class for distillation_config compatibility.

  3. examples/on_policy_distillation_trainer/run_qwen_gsm8k_veomni.sh: Add example script for testing OPD with VeOmni backend, following the model_engine=veomni configuration pattern with actor.veomni.* settings.

API and Usage Example

export DATA_PATH=/path/to/data
bash examples/on_policy_distillation_trainer/run_qwen_gsm8k_veomni.sh

Test

  • Verified with GSM8K dataset using Qwen2.5-0.5B (student) and Qwen2.5-3B-Instruct (teacher) on 2+4 GPUs.
image

Checklist Before Starting

  • Search for similar PRs: no existing PR adds veomni OPD support.
  • PR title follows [veomni, trainer] feat: ... format.

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add unit or end-to-end test(s) to cover the code. If not feasible, explain why: this is an integration-level feature requiring multi-GPU and the veomni external dependency; e2e testing covered by manual script validation.
  • Add / Update the documentation.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the veomni engine in on-policy distillation, including a new example script for GSM8K and updates to worker initialization and loss computation. Feedback suggests explicitly defining the distillation_config parameter in the DetachActorWorker constructor to prevent potential TypeError issues when arguments are passed positionally.

Comment thread verl/experimental/separation/engine_workers.py Outdated
Add VeOmni strategy support to the distillation and separation worker
components so that VeOmniEngine can be used as a training backend in
on-policy distillation (OPD) scenarios.

Changes:
- Match 'veomni' alongside 'fsdp' in compute_topk_loss() so VeOmni
  can reuse FSDP's topk forward KL loss implementation.
- Add 'veomni' to DetachActorWorker's save/restore strategy handlers
  (fsdp2 sharded save/load), enabling model offload in separation mode.
- Pass **kwargs through DetachActorWorker.__init__ to parent class
  for distillation_config compatibility.
- Add example script run_qwen_gsm8k_veomni.sh for testing OPD with
  VeOmni backend.
@hjshi84 hjshi84 force-pushed the feat/veomni-opd-support branch from f814978 to 5c081d8 Compare April 20, 2026 11:13
@hjshi84 hjshi84 marked this pull request as ready for review April 21, 2026 01:44
"""

def __init__(self, config: DictConfig, role: str):
def __init__(self, config: DictConfig, role: str, distillation_config: Optional[DistillationConfig] = None, **kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't support distillation in fully async training yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not about enabling distillation in fully async training. It's a signature alignment fix — the parent class ActorRolloutRefWorker.init already accepts distillation_config and **kwargs, and ray_trainer.py already passes distillation_config when constructing the worker. The previous DetachActorWorker.init(self, config, role) silently dropped these arguments because its signature was narrower than the base class. This PR simply makes the subclass properly forward them to super().init.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think pass **kwargs should be enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, however gemini suggests explicitly defining the distillation_config parameter in the DetachActorWorker constructor to prevent potential TypeError issues when arguments are passed positionally TT

wuxibin89
wuxibin89 previously approved these changes Apr 21, 2026
@wuxibin89
Copy link
Copy Markdown
Collaborator

- Explain why VeOmni shares FSDP2 save/load handlers (inherits FSDPEngine,
  parameters are DTensors compatible with fsdp2_sharded_save/load_from_cpu)
- Document known caveat: param_offload=True may cause device mismatch in
  save_model_to_cpu / restore_model_from_cpu (historical issue)
- Add inline comment in compute_topk_loss clarifying VeOmni uses FSDP loss path
- Update class docstring to list VeOmni as a supported strategy
@wuxibin89 wuxibin89 merged commit 0114e2a into verl-project:main Apr 21, 2026
62 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants