[veomni] feat: enable VeOmni engine for on-policy distillation#6072
[veomni] feat: enable VeOmni engine for on-policy distillation#6072wuxibin89 merged 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the veomni engine in on-policy distillation, including a new example script for GSM8K and updates to worker initialization and loss computation. Feedback suggests explicitly defining the distillation_config parameter in the DetachActorWorker constructor to prevent potential TypeError issues when arguments are passed positionally.
Add VeOmni strategy support to the distillation and separation worker components so that VeOmniEngine can be used as a training backend in on-policy distillation (OPD) scenarios. Changes: - Match 'veomni' alongside 'fsdp' in compute_topk_loss() so VeOmni can reuse FSDP's topk forward KL loss implementation. - Add 'veomni' to DetachActorWorker's save/restore strategy handlers (fsdp2 sharded save/load), enabling model offload in separation mode. - Pass **kwargs through DetachActorWorker.__init__ to parent class for distillation_config compatibility. - Add example script run_qwen_gsm8k_veomni.sh for testing OPD with VeOmni backend.
f814978 to
5c081d8
Compare
| """ | ||
|
|
||
| def __init__(self, config: DictConfig, role: str): | ||
| def __init__(self, config: DictConfig, role: str, distillation_config: Optional[DistillationConfig] = None, **kwargs): |
There was a problem hiding this comment.
We haven't support distillation in fully async training yet.
There was a problem hiding this comment.
This change is not about enabling distillation in fully async training. It's a signature alignment fix — the parent class ActorRolloutRefWorker.init already accepts distillation_config and **kwargs, and ray_trainer.py already passes distillation_config when constructing the worker. The previous DetachActorWorker.init(self, config, role) silently dropped these arguments because its signature was narrower than the base class. This PR simply makes the subclass properly forward them to super().init.
There was a problem hiding this comment.
Got it, I think pass **kwargs should be enough?
There was a problem hiding this comment.
yes, however gemini suggests explicitly defining the distillation_config parameter in the DetachActorWorker constructor to prevent potential TypeError issues when arguments are passed positionally TT
- Explain why VeOmni shares FSDP2 save/load handlers (inherits FSDPEngine, parameters are DTensors compatible with fsdp2_sharded_save/load_from_cpu) - Document known caveat: param_offload=True may cause device mismatch in save_model_to_cpu / restore_model_from_cpu (historical issue) - Add inline comment in compute_topk_loss clarifying VeOmni uses FSDP loss path - Update class docstring to list VeOmni as a supported strategy
86dc2fe to
d1422c1
Compare
What does this PR do?
Add VeOmni strategy support to the distillation and separation worker components so that
VeOmniEnginecan be used as a training backend in on-policy distillation (OPD) scenarios.Design & Code Changes
verl/trainer/distillation/losses.py: Match'veomni'alongside'fsdp'incompute_topk_loss()so VeOmni can reuse FSDP's topk forward KL loss implementation. VeOmni inherits from FSDP2 and shares the same loss computation logic.verl/experimental/separation/engine_workers.py: Add'veomni'toDetachActorWorker's save/restore strategy handlers (fsdp2 sharded save/load), enabling model offload in separation mode. Also pass**kwargsthrough__init__to parent class fordistillation_configcompatibility.examples/on_policy_distillation_trainer/run_qwen_gsm8k_veomni.sh: Add example script for testing OPD with VeOmni backend, following themodel_engine=veomniconfiguration pattern withactor.veomni.*settings.API and Usage Example
export DATA_PATH=/path/to/data bash examples/on_policy_distillation_trainer/run_qwen_gsm8k_veomni.shTest
Checklist Before Starting
Checklist Before Submitting