Skip to content

feat: streaming each dtensor in refit#176

Merged
SahilJain314 merged 15 commits intomainfrom
yukih/refit
Apr 23, 2025
Merged

feat: streaming each dtensor in refit#176
SahilJain314 merged 15 commits intomainfrom
yukih/refit

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 14, 2025

What does this PR do ?

improve refitting by avoiding all gather total tensors to save memory

implementation method

  1. get total dtensor through prepare_weights_for_ipc
  2. for grouped tensor: change to full_tensor and pass to vllm through IPC
  3. clean used up full_tensor (this will be auto done by python gc)

original version

needs (1/n + 1)x weights, since it get the total full params and put it once to vllm.
1/n: dtensor in FSDP model
1 : full_tensor get by state_dict

new version

needs (2/n + some overhead)x weights.
1/n: dtensor in FSDP model
1/n: dtensor get by state_dict
some overhead: change dtensor to full_tensor streamly

Issues

Closes #163

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@SahilJain314 SahilJain314 self-requested a review April 14, 2025 17:42
@yuki-97 yuki-97 force-pushed the yukih/refit branch 2 times, most recently from 678f421 to 9337e9c Compare April 15, 2025 07:01
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 15, 2025

After moving loop to outside instead of yield and remove manually gc.collect(), the time cost is improved by 41.84s -> 5.00s for 8GPUs and 17.97s -> 2.10s for 2GPUs.

@yuki-97 yuki-97 changed the title improve refit: avoid all gather total tensors feat: improve refitting Apr 15, 2025
@yuki-97 yuki-97 marked this pull request as ready for review April 15, 2025 07:12
@yuki-97 yuki-97 changed the title feat: improve refitting feat: streaming each dtensor in refit Apr 15, 2025
@yuki-97 yuki-97 added Run CICD and removed Run CICD labels Apr 15, 2025
@yuki-97 yuki-97 marked this pull request as draft April 15, 2025 13:10
@yuki-97 yuki-97 added Run CICD and removed Run CICD labels Apr 16, 2025
@yuki-97 yuki-97 added Run CICD and removed Run CICD labels Apr 16, 2025
@yuki-97 yuki-97 marked this pull request as ready for review April 16, 2025 09:07
Comment thread nemo_reinforcer/models/policy/hf_policy.py Outdated
Comment thread tests/unit/models/generation/test_vllm_generation.py
@yuki-97 yuki-97 force-pushed the yukih/refit branch 2 times, most recently from 2d2d53e to bc7ecd6 Compare April 17, 2025 03:12
@yuki-97 yuki-97 added Run CICD CI:L0 Run doctests and unit tests and removed Run CICD labels Apr 17, 2025
Comment thread nemo_reinforcer/models/policy/hf_policy.py Outdated
@yuki-97 yuki-97 force-pushed the yukih/refit branch 2 times, most recently from c17dd7c to 7d98def Compare April 17, 2025 08:41
@yuki-97 yuki-97 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Apr 17, 2025
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L0 Run doctests and unit tests labels Apr 22, 2025
yuki-97 added 5 commits April 22, 2025 05:34
Signed-off-by: Yuki Huang <yukih@nvidia.com>

group refit tensor by size instead of count

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update get_weights_ipc_handles

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update fsdp1 and debug log

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 22, 2025
terrykong
terrykong previously approved these changes Apr 22, 2025
Comment thread nemo_reinforcer/algorithms/grpo.py Outdated
Comment thread nemo_reinforcer/models/generation/vllm.py
Comment thread examples/configs/grpo_math_1B.yaml Outdated
…ouping keys to not include the key that exceeds the size limit

Signed-off-by: Parth Chadha <pchadha@nvidia.com>
@parthchadha parthchadha added CI:L0 Run doctests and unit tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 22, 2025
@SahilJain314 SahilJain314 enabled auto-merge April 22, 2025 23:29
@SahilJain314 SahilJain314 added this pull request to the merge queue Apr 22, 2025
Merged via the queue into main with commit ed546ae Apr 23, 2025
19 checks passed
@SahilJain314 SahilJain314 deleted the yukih/refit branch April 23, 2025 00:17
ashors1 pushed a commit that referenced this pull request Apr 24, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Alex Qiu <alexq@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Alex Qiu <alexq@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
aschilling-nv pushed a commit that referenced this pull request Apr 25, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Alex Qiu <alexq@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Alex Qiu <alexq@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
terrykong added a commit that referenced this pull request May 1, 2025
commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>
terrykong pushed a commit that referenced this pull request May 1, 2025
Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Tech pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

fix typo

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Incorporated Reviewer Comments in ReadMe

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pups updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pubs updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs minor edits to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Squashed commit of the following:

commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>
KiddoZhu pushed a commit that referenced this pull request May 6, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Alex Qiu <alexq@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Alex Qiu <alexq@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming each weight tensor (gather each weight)

5 participants