-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Description
veRL Megatron-core Development Tracking
This page focuses on development of verl+mcore.
The milestone target is to enable training deepseek-v3 on veRL as #708 and the further target is to continuously enhance the verl training experience of the mcore backend.
Progress and TODO
Recent
- update mcore version to 0.11 megatron:Update megatron-lm to
core_r0.11.0#392 - use mcore
GPTModelapi instead of huggingface workaround with sequence packing Use Mcore GPTModel #706 - support context parallel [Mcore] context parallel #970
- support loading mcore dist_checkpointing [mcore] option to use dist checkpoint #1030
- support Megatron 0.11.0 and vLLM 0.8.2 Support Megatron 0.11.0 and vLLM 0.8.2, update images to use latest vllm and Megatron #851
- support qwen2moe training [mcore] qwen2moe support #1139
- support
Moonlight-16B-A3Btraining (WIP) [mcore] moonlight (small model with deepseekv3 arch) #1284 - support
Qwen2.5-VLtraining [megatron] feat: qwen2.5vl #1286 - support EP(expert parallel) [megatron] support megatron expert parallel #1467
Further
- FP8 training
- training efficiency related optimization
- support sglang inference engine
- support trtllm inference engine
Reactions are currently unavailable