[RL] Changes to enable compilation for trainer by Lucaskabela · Pull Request #2568 · pytorch/torchtitan

Lucaskabela · 2026-03-13T19:07:23Z

Summary

In this PR, we enable naive, JIT style torch.compile for the RL policy trainer. This is the first step towards speeding up the trainer model. Changes are:

Wiring through compilation config:

Added TrainerCompileConfig dataclass with enable (bool) and backend (str, default "eager") fields
Added compile field to PolicyTrainer.Config with compile and aot_eager backend
Added _compile_model() method that calls .compile(backend=..., fullgraph=True) on each transformer
layer -> This is crticial, as torch.compile() results in logit changes

config_registry.py — Enable compile by default in configs

Both rl_grpo_qwen3_0_6b and rl_grpo_qwen3_debug configs now set
compile=TrainerCompileConfig(enable=True). Default backend is 'eager'

vllm_compat/models/attention.py — Make flash-attention compile-compatible

Moved the FlashAttnWithBackward autograd function out of the forward() method (nested classes
can't be traced by the compiler) into a module-level FlashAttnVarlenFunction
Registered the flash-attention forward as a torch.library.custom_op (rl::flash_attn_varlen_fwd)
with a fake implementation, so AOT Autograd can trace through it with FakeTensors
Simplified the call site in VLLMCompatibleFlashAttention.forward() to use the new function

Test Plan

python torchtitan/experiments/rl/unified/simple_grpo_sum_digits.py --module rl.unified --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B

Results in the same losses as on main - the timing is now like:

Main

[2026-03-13 12:13:11] INFO simple_grpo_sum_digits.py:401: [actor=<root>] Cumulative Timing | Generator: 22.6s | Optimizer: 0.1s | Trainer: 148.6s | WeightSync: 119.4s | Total: 290.7s

Changes

[2026-03-13 12:03:26] INFO simple_grpo_sum_digits.py:401: [actor=<root>] Cumulative Timing | Generator: 22.1s | Optimizer: 0.1s | Trainer: 103.2s | WeightSync: 119.2s | Total: 244.6s

So we save ~50s of runtime

torchtitan/experiments/rl/unified/actors/trainer.py

tianyu-l · 2026-03-13T20:03:55Z

torchtitan/experiments/rl/vllm_compat/models/attention.py

if we go with pytorch varlen, do we still need to worry about this file? cc @wwwjn

torchtitan/experiments/rl/unified/actors/trainer.py

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2026

pytorch-bot bot added the ciflow/8gpu label Mar 13, 2026

Lucaskabela marked this pull request as ready for review March 13, 2026 19:19

tianyu-l reviewed Mar 13, 2026

View reviewed changes

Lucaskabela requested a review from wwwjn March 13, 2026 20:39

tianyu-l requested review from liangel-02 and zhxchen17 March 13, 2026 21:47

Changes to enable compilation

42d870c

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch from 8a62589 to be75f04 Compare March 13, 2026 22:14

tianyu-l reviewed Mar 13, 2026

View reviewed changes

torchtitan/experiments/rl/unified/actors/trainer.py Outdated Show resolved Hide resolved

Incorporate feedback on reusing config

520d314

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch from be75f04 to 520d314 Compare March 13, 2026 23:50

Lucaskabela requested a review from tianyu-l March 14, 2026 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL] Changes to enable compilation for trainer#2568

[RL] Changes to enable compilation for trainer#2568
Lucaskabela wants to merge 2 commits intopytorch:mainfrom
Lucaskabela:lucaskabela/enable_trainer_compile_03_10

Lucaskabela commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tianyu-l Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Lucaskabela commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Main

Changes

Uh oh!

Uh oh!

Uh oh!

tianyu-l Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lucaskabela commented Mar 13, 2026 •

edited

Loading