[trainer] refactor: code refactor for diffusion training#6042
[trainer] refactor: code refactor for diffusion training#6042zhtmike wants to merge 8 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the diffusion training pipeline by introducing a modular configuration structure and dedicated dataclasses for diffusion-specific components. Key changes include the implementation of registries for diffusion loss functions, advantage estimators, and vLLM-Omni pipelines, alongside the removal of the response_mask requirement and legacy reward migration logic. The review feedback highlights a runtime error in configuration dictionary conversion for dataclasses, potential dynamic attribute assignment failures in Pydantic models for metrics, and a type mismatch for the loss_scale_factor parameter.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@knlnguyen1802 please take a look of vllm-omni change thx, I think it is better to load the custom class directly in vllm-omni side? |
1878b48 to
35b9818
Compare
What does this PR do?
Major change:
verl/trainer/config/diffusion, includingdiffusion_actor,dp_diffusion_actor,diffusion_fsdp,diffusion_rollout, etc.verl/workers/config/diffusion, includingactor,model,rollout.use_remove_padding, or unused configs. The content of_generated_diffusion_trainer.yamlhas now been reduced by ~60%.Other changes:
input_id,attention_mask, etc., which are not used in diffusion training currently.extra_configsin diffusion configs. It is too loose and may cause confusion for users. (resolve comment from [5/n][trainer] feat: flowgrpo trainer #5951)Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,vllm_omni,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.