[data] feat: Add dataset for Qwen-Image#6
Conversation
* add training engine * fix init * fix typs
* init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright
* init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn
verl/utils/dataset/qwen_dataset.py
Outdated
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # ============================================================================ |
There was a problem hiding this comment.
no need # ============================================================================
Add a comment on line R14`
There was a problem hiding this comment.
Pull request overview
This PR adds a new QwenDataset class for handling text prompts in Qwen-Image models, particularly for text-guided vision generation tasks. The dataset supports loading prompts from text files, tokenization with configurable templates, and extraction of ground truth data for OCR tasks.
Key changes:
- New dataset implementation with prompt filtering, truncation, and tokenization support
- Integration with existing reward loop for diffusion models
- Unit tests for dataset functionality and dataloader compatibility
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| verl/utils/dataset/qwen_dataset.py | New dataset class for Qwen-Image with prompt loading, tokenization, and ground truth extraction |
| verl/utils/dataset/init.py | Export the new QwenDataset class |
| verl/experimental/reward_loop/reward_manager/diffusion.py | Remove unused commented code |
| tests/utils/dataset/test_qwen_dataset_on_cpu.py | Unit tests for QwenDataset with basic functionality and max_samples parameter |
| tests/experimental/reward_loop/test_diffusion_reward_model_genrm.py | Update test to use "input_ids" instead of "prompts" key |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
verl/utils/dataset/qwen_dataset.py
Outdated
| Args: | ||
| data_files (str): Path to the text file containing prompts. | ||
| tokenizer (PreTrainedTokenizer): Tokenizer to tokenize the prompts. | ||
| config (OmegaConf): the data config. |
There was a problem hiding this comment.
The docstring says the parameter type is OmegaConf but the actual type hint is DictConfig. Consider updating the docstring to match the type hint for consistency.
| config (OmegaConf): the data config. | |
| config (DictConfig): the data config. |
verl/utils/dataset/qwen_dataset.py
Outdated
| if self.filter_overlong_prompts: | ||
| self.prompts = [x for x in self.prompts if len(x) <= self.max_prompt_length] | ||
|
|
||
| if self.max_samples > 0 and self.max_samples < len(self.prompts): | ||
| self.prompts = self.prompts[: self.max_samples] |
There was a problem hiding this comment.
The order of operations could be optimized. Currently, the code filters overlong prompts (line 62) before applying max_samples (lines 64-65). If filter_overlong_prompts significantly reduces the dataset size, the final dataset might have fewer samples than max_samples. Consider applying max_samples before filtering to ensure you get the requested number of samples, or document this behavior clearly.
|
if possible, we maintain the original configurations and returns from https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L70 |
updated returns |
verl/utils/dataset/qwen_dataset.py
Outdated
| def maybe_filter_out_long_prompts(self, prompts: list): | ||
| # filter out too long prompts | ||
| if self.filter_overlong_prompts: | ||
| prompts = [x for x in prompts if len(x) <= self.max_prompt_length] |
There was a problem hiding this comment.
I think the official filter_out_long_prompts are based on the length of the tokenized integers, not the length of the string.
* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * Update 20260109 (#8) * Update 20260109 * update * fix CI * [data] feat: Add dataset for Qwen-Image (#6) * add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com> * add new config; debug actor * debug; add reward config; add adv, policy loss * debug reward loop * init diffusers engine UT * debug * debug * deubg actor forward * debug * merge * add UT for adv and loss * pass adv&loss UTs; pass engine backward UT * clean debug code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
* add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * Update 20260109 (#8) * Update 20260109 * update * fix CI * [data] feat: Add dataset for Qwen-Image (#6) * add entroypoint (#1) * add training engine (#2) * add training engine * fix init * fix typs * move folders & make for two-forward pass in training loop (#4) * Add diffusion reward loop (#3) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * [fix] update customized reward func in UT (#5) * init reward; add ocr reward * update disrm input * add unit test * pass ut * fix typos/bugs * update copyright * update customized reward_fn * init dataset for Qwen-Image * pass UT * update return, update UT * pass UT * align with rl_dataset * pass UT * update filter long prompts * debug * clean code --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com> * update to align verl data format * debug --------- Co-authored-by: Cheung Ka Wai <zhtmike@gmail.com>
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.