[megatron] feat: qwen2.5vl#1286
Conversation
…en25vl_tmp_update0527_v2
…en25vl_tmp_update0527_v3
…en25vl_tmp_update0527_v4
…en25vl_tmp_update0527_v5
eric-haibin-lin
left a comment
There was a problem hiding this comment.
could u add a new dataset and a reference training record in https://verl.readthedocs.io/en/latest/algo/baseline.html ? thanks! (doing it in next PR is fine).
i do not have further comments for now. i think it's crucial to refactor the worker classes according to so that we can have standalone tests for model fwd/bwd. #1560 #1913
Sure, I will add a record of qwen2.5vl+megatron on next PR. For the refactors, I will keep tracking and contribute from aspect of megatron |
|
@dataproblems do you want to take a final look? |
|
@eric-haibin-lin - my bad! I missed this! I'll try to make sure I get the notifications from now on!! |
|
@eric-haibin-lin added a record and a recipe of training qwen2.5vl 7b, see #1969 |
|
@ISEEKYAN hi thanks for your awesome work! it seems we should guarantee there must be one image in mcore_batch_sz ? |
multiple images are stacked in https://github.com/volcengine/verl/blob/main/verl/workers/actor/megatron_actor.py#L412 |
thanks for your replay! but when we use plain-text data (no images or no videos), there would be KeyError in https://github.com/volcengine/verl/blob/main/verl/models/mcore/model_forward.py#L91-L92. maybe we should pad empty images in the forward function? |
@MaoChouHJM It is a very good question. The qwen2.5vl it self support pure language input, so this is a bug in existing code. would you please contribute a PR to fix this? |
of course, i will try to contribute my first PR. you mentioned "The qwen2.5vl it self support pure language input", means the huggingface implemetion? aybe I can refer to this code and fix it. |
Both HF implementation and this megatron implementation support pure language as input. While it is a bug that existing code asserts that multimodal input exists. |
@ISEEKYAN #1999 here is my pr, would you please review it :> |
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
…-text and image-text (#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to #1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…-text and image-text (verl-project#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…-text and image-text (verl-project#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
…-text and image-text (verl-project#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…-text and image-text (#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project/verl#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
…-text and image-text (verl-project#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />
…-text and image-text (verl-project#1999) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to verl-project#1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
works with qwen2.5vl 3b + geo3k