qwen1.5b-instruct grpo padding failure

**Describe the bug**
```
  File "/tmp/ray/session_2025-04-02_10-39-48_823538_118840/runtime_resources/working_dir_files/_ray_pkg_e29c99657fe54f18/nemo_reinforcer/models/generation/interfaces.py", line 89, in verify_right_padding
    raise ValueError(msg)
ValueError: Non-padding values found after specified length at index 0: positions [145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194]
(HfPolicyWorker[rank=0] pid=123060) GPU Memory after refit complete: 13.24GB allocated, 13.25GB reserved
```

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report  http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports. 


**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment overview (please complete the following information)**

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
 - Method of install: [pip install or from source]. Please specify exact commands you used to install.
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version

**Additional context**

Add any other context about the problem here.
Example: GPU model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen1.5b-instruct grpo padding failure #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

qwen1.5b-instruct grpo padding failure #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions