Refit is slower with Megatron-Bridge than nemo.tron for moe models. To reproduce, use these branches:
baseline: https://github.com/NVIDIA-NeMo/RL/tree/yifu/before_mbridge
mbridge: https://github.com/NVIDIA-NeMo/RL/tree/yifu/mbridge_dsv3
DSv2-Lite:
uv run python examples/run_grpo_math.py --config=examples/configs/grpo_math_1B_megatron.yaml grpo.val_batch_size=2 policy.model_name=deepseek-ai/DeepSeek-V2-Lite-Chat cluster.gpus_per_node=8 policy.megatron_cfg.pipeline_model_parallel_size=4 policy.megatron_cfg.num_layers_in_first_pipeline_stage=7 policy.megatron_cfg.num_layers_in_last_pipeline_stage=6 policy.max_total_sequence_length=1024 checkpointing.enabled=False checkpointing.save_period=5 grpo.val_period=-1 grpo.val_at_start=False grpo.max_val_samples=16 policy.megatron_cfg.expert_model_parallel_size=2 policy.megatron_cfg.apply_rope_fusion=False
Make sure to set a different NRL_MEGATRON_CHECKPOINT_DIR when testing with mbridge so it won't reuse the nemo.tron checkpoints.

Refit is slower with Megatron-Bridge than nemo.tron for moe models. To reproduce, use these branches:
baseline: https://github.com/NVIDIA-NeMo/RL/tree/yifu/before_mbridge
mbridge: https://github.com/NVIDIA-NeMo/RL/tree/yifu/mbridge_dsv3
DSv2-Lite:
uv run python examples/run_grpo_math.py --config=examples/configs/grpo_math_1B_megatron.yaml grpo.val_batch_size=2 policy.model_name=deepseek-ai/DeepSeek-V2-Lite-Chat cluster.gpus_per_node=8 policy.megatron_cfg.pipeline_model_parallel_size=4 policy.megatron_cfg.num_layers_in_first_pipeline_stage=7 policy.megatron_cfg.num_layers_in_last_pipeline_stage=6 policy.max_total_sequence_length=1024 checkpointing.enabled=False checkpointing.save_period=5 grpo.val_period=-1 grpo.val_at_start=False grpo.max_val_samples=16 policy.megatron_cfg.expert_model_parallel_size=2 policy.megatron_cfg.apply_rope_fusion=FalseMake sure to set a different
NRL_MEGATRON_CHECKPOINT_DIRwhen testing with mbridge so it won't reuse the nemo.tron checkpoints.