Describe the bug
In the following setup, token_mult_prob_error becomes NaN since 2nd step and generated sequence is always to the max length, indicating there is some bug in the refit:
- fp8 training in mcore path
- fp8_param=True
- TP>2
- blockwise quantization recipe
it works when TP=1, or when fp8_param=False, or when per-tensor quant is used.
Steps/Code to reproduce bug
export EXP_SUFFIX="grpo_math_8B_megatron_fp8_dev"
# Set up paths and names based on experiment suffix
export CHECKPOINT_DIR="results/${EXP_SUFFIX}"
export WANDB_NAME=${EXP_SUFFIX}
export RAY_DEDUP_LOGS=0
export BASE_LOG_DIR="logs/${EXP_SUFFIX}"
export CONTAINER=<container tag>
export MOUNTS="${PWD}:/opt/nemo-rl"
export NUM_ACTOR_NODES=${NUM_NODES:-1}
export NVTE_FP8_BLOCK_SCALING_FP32_SCALES=1
export COMMAND="\
uv run python examples/run_grpo_math.py \
--config examples/configs/grpo_math_8B_megatron_fp8.yaml \
policy.megatron_cfg.tensor_model_parallel_size=2 \
logger.wandb_enabled=true \
logger.wandb.project="nemo-rl-grpo-dev-guyueh" \
logger.wandb.name="${WANDB_NAME}" \
checkpointing.enabled=true \
checkpointing.checkpoint_dir="${CHECKPOINT_DIR}" \
cluster.num_nodes=${NUM_ACTOR_NODES}"
INTERACTIVE=${INTERACTIVE:-0}
if [ $INTERACTIVE -eq 1 ]; then
export COMMAND=
fi
export PARTITION=batch
sbatch \
--nodes=${NUM_ACTOR_NODES} \
--account=coreai_dlalgo_nemorl \
--job-name=coreai_dlalgo_nemorl-grpo.${EXP_SUFFIX} \
--partition=${PARTITION} \
--gres=gpu:8 \
--time=04:00:00 \
ray.sub
Expected behavior
token_mult_prob_error < 1.05 and mean generated seqlen is <3000 for at least the first 5 steps
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
- Method of install: [pip install or from source]. Please specify exact commands you used to install.
- If method of install is [Docker], provide
docker pull & docker run commands used
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version
Additional context
Add any other context about the problem here.
Example: GPU model
Describe the bug
In the following setup, token_mult_prob_error becomes NaN since 2nd step and generated sequence is always to the max length, indicating there is some bug in the refit:
it works when TP=1, or when fp8_param=False, or when per-tensor quant is used.
Steps/Code to reproduce bug
Expected behavior
token_mult_prob_error < 1.05 and mean generated seqlen is <3000 for at least the first 5 steps
Environment overview (please complete the following information)
docker pull&docker runcommands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: GPU model