[doc] Add a JaxTrainer template by liulehui · Pull Request #59842 · ray-project/ray

liulehui · 2026-01-05T06:45:56Z

Description

This PR adds a workspace template that walks users through how to Ray Train JaxTrainer.
The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU.

For a high level overview, this template covers:

A hands-on example of training an GPT2 model using Jax/Flax
Sample ScalingConfig for both GPU and TPU
Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training

Testing:
tested in Anyscale workspace,

Additional information

https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster

gemini-code-assist

Code Review

This pull request introduces a valuable JaxTrainer template for training a GPT-2 style model on GPUs and TPUs. The notebook and markdown file provide a comprehensive walkthrough. My review focuses on improving code correctness, clarity, and maintainability. I've identified a critical bug in the metric reporting logic that could cause training runs to hang, and I've provided a fix. Additionally, I've made several suggestions to enhance the example's robustness and readability, including correcting command-line syntax, simplifying data iteration, and removing redundant code.

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

JasonLi1909

Awesome template! Left some comments, reminder to do the other steps (add to ci, compute configs, etc). Would be good to get a pass from @angelinalg at some point. Thanks!

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

github-actions · 2026-01-30T00:48:54Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

matthewdeng

very cool

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

Signed-off-by: Lehui Liu <lehui@anyscale.com>

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

doc/source/train/examples/jax/intro_to_jax_trainer/README.md

Signed-off-by: Lehui Liu <lehui@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb

## Description 1. This PR adds a workspace template that walks users through how to Ray Train [JaxTrainer](https://docs.ray.io/en/master/train/api/doc/ray.train.v2.jax.JaxTrainer.html). 2. The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU. For a high level overview, this template covers: * A hands-on example of training an GPT2 model using Jax/Flax * Sample `ScalingConfig` for both GPU and TPU * Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training Testing: tested in Anyscale workspace, ## Additional information 1. https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

## Description 1. This PR adds a workspace template that walks users through how to Ray Train [JaxTrainer](https://docs.ray.io/en/master/train/api/doc/ray.train.v2.jax.JaxTrainer.html). 2. The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU. For a high level overview, this template covers: * A hands-on example of training an GPT2 model using Jax/Flax * Sample `ScalingConfig` for both GPU and TPU * Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training Testing: tested in Anyscale workspace, ## Additional information 1. https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

## Description 1. This PR adds a workspace template that walks users through how to Ray Train [JaxTrainer](https://docs.ray.io/en/master/train/api/doc/ray.train.v2.jax.JaxTrainer.html). 2. The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU. For a high level overview, this template covers: * A hands-on example of training an GPT2 model using Jax/Flax * Sample `ScalingConfig` for both GPU and TPU * Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training Testing: tested in Anyscale workspace, ## Additional information 1. https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com>

## Description 1. This PR adds a workspace template that walks users through how to Ray Train [JaxTrainer](https://docs.ray.io/en/master/train/api/doc/ray.train.v2.jax.JaxTrainer.html). 2. The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU. For a high level overview, this template covers: * A hands-on example of training an GPT2 model using Jax/Flax * Sample `ScalingConfig` for both GPU and TPU * Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training Testing: tested in Anyscale workspace, ## Additional information 1. https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

## Description 1. This PR adds a workspace template that walks users through how to Ray Train [JaxTrainer](https://docs.ray.io/en/master/train/api/doc/ray.train.v2.jax.JaxTrainer.html). 2. The purpose of this template is to walk user through how to use the JaxTrainer to train a GPT2 style model with both GPU and TPU. For a high level overview, this template covers: * A hands-on example of training an GPT2 model using Jax/Flax * Sample `ScalingConfig` for both GPU and TPU * Simple integration with Ray Data to read the pre-tokenized openwebtext dataset for training Testing: tested in Anyscale workspace, ## Additional information 1. https://console.anyscale-staging.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_g7p6lsu6r8g7garwbxifppyz23/workspaces/expwrk_bw8izpdi59293i5e73h6biwkak/train/train/46607266f3cb454aa9e7f7929b3aaae3/workers/fbe9a9b91fdf90e7476bb12a0d000000?workspace-tab=code&command-history-section=application_logs&file=%252Fmnt%252Fcluster_storage%252Fgpt2%252F0104_raydata%252Fcheckpoints%252F1x1&storage=cluster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

liulehui assigned matthewdeng Jan 5, 2026

liulehui marked this pull request as ready for review January 5, 2026 17:00

liulehui requested review from a team as code owners January 5, 2026 17:00

cursor bot reviewed Jan 5, 2026

View reviewed changes

ray-gardener bot added docs An issue or change related to documentation train Ray Train Related Issue labels Jan 5, 2026

JasonLi1909 reviewed Jan 9, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Outdated Show resolved Hide resolved

liulehui requested a review from JasonLi1909 January 13, 2026 22:49

cursor bot reviewed Jan 13, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Outdated Show resolved Hide resolved

matthewdeng reviewed Jan 13, 2026

View reviewed changes

cursor bot reviewed Jan 15, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

cursor bot reviewed Jan 15, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

cursor bot reviewed Jan 15, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

liulehui added the go add ONLY when ready to merge, run all tests label Jan 15, 2026

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 30, 2026

matthewdeng reviewed Jan 30, 2026

View reviewed changes

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jan 30, 2026

liulehui force-pushed the jaxtrainer branch from e42ed81 to fd87dcf Compare January 31, 2026 00:29

liulehui added 7 commits January 30, 2026 16:37

a working jaxTrainer template

c8553cd

Signed-off-by: Lehui Liu <lehui@anyscale.com>

add readme.md

eaf98c2

Signed-off-by: Lehui Liu <lehui@anyscale.com>

address some comments

c395fcb

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressed comments

c67cb04

Signed-off-by: Lehui Liu <lehui@anyscale.com>

regenerate md

b58d5eb

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressed another comment

9f482eb

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressed comments

f296db2

Signed-off-by: Lehui Liu <lehui@anyscale.com>

liulehui added 4 commits January 30, 2026 16:37

convert new README.md

aeb59f4

Signed-off-by: Lehui Liu <lehui@anyscale.com>

add to jax/flax to allowed framework

c3d38cf

Signed-off-by: Lehui Liu <lehui@anyscale.com>

specify orphan for notebook

2976cf8

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressing comments

361eea3

Signed-off-by: Lehui Liu <lehui@anyscale.com>

liulehui force-pushed the jaxtrainer branch from fd87dcf to b8cbda7 Compare January 31, 2026 00:37

cursor bot reviewed Jan 31, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

liulehui force-pushed the jaxtrainer branch from b8cbda7 to 361eea3 Compare January 31, 2026 00:55

cursor bot reviewed Jan 31, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

matthewdeng reviewed Feb 2, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.md Outdated Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.md Outdated Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.md Show resolved Hide resolved

liulehui added 4 commits February 2, 2026 13:53

addressing comments

3f53253

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressing comments

3e8df43

Signed-off-by: Lehui Liu <lehui@anyscale.com>

addressing comments

0604a7c

Signed-off-by: Lehui Liu <lehui@anyscale.com>

Merge branch 'master' into jaxtrainer

41d0541

cursor bot reviewed Feb 2, 2026

View reviewed changes

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

doc/source/train/examples/jax/intro_to_jax_trainer/README.ipynb Show resolved Hide resolved

matthewdeng enabled auto-merge (squash) February 2, 2026 22:20

matthewdeng approved these changes Feb 2, 2026

View reviewed changes

matthewdeng merged commit 1490183 into ray-project:master Feb 2, 2026
4 of 7 checks passed

Conversation

liulehui commented Jan 5, 2026

Description

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JasonLi1909 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

matthewdeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development