[do not review] Add SFT experiment by joecummings · Pull Request #2556 · pytorch/torchtitan

joecummings · 2026-03-11T21:27:51Z

No description provided.

Adds an SFT dataloader and config under torchtitan/experiments/sft/ that reuses the existing Trainer without modification. Key features: - Incremental prefix re-tokenization for correct label masking at BPE boundaries (matches torchtune's approach) - Greedy sequence packing with EOS-based document boundaries for flex/varlen attention backends - Config validation for attention backend compatibility and validation hang prevention - Epoch shuffling with deterministic seeds for checkpoint reproducibility - GSM8K dataset config with Qwen3 reasoning trace support Includes 13 unit tests and a 2-GPU integration test.

tianyu-l · 2026-03-12T09:29:18Z

torchtitan/experiments/sft/dataset.py

does it make sense to put in torchtitan/hf_datasets

tianyu-l · 2026-03-12T09:32:17Z

torchtitan/experiments/sft/configs.py

+
+
+@dataclass(kw_only=True, slots=True)
+class SFTTrainerConfig(Trainer.Config):


I'm interested in exploring the feasibility to land this in core (outside experiments/), possibly by consolidating the trainer / dataloader and their configs.

…idation When packing multiple documents into one sequence, RoPE positions were not reset per document, causing later documents to receive wrong positional embeddings (positions continued from the previous document instead of restarting at 0). This fixes pytorch#2559. Changes: - Yield per-document position tensors from packed sequences that reset to 0 at each document boundary, flowing through extra_inputs to Decoder.forward(positions=...) - Validate attn_mask_type='block_causal' when pack_sequences=True to prevent cross-document attention leakage - Simplify _tokenize_sample to single-turn with explicit validation - Extract _flush_pack_buffer helper and use slice assignment for masking

pytorch-bot bot added the ciflow/8gpu label Mar 11, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 11, 2026

tianyu-l reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[do not review] Add SFT experiment#2556

[do not review] Add SFT experiment#2556
joecummings wants to merge 2 commits intopytorch:mainfrom
joecummings:sft-experiment

joecummings commented Mar 11, 2026

Uh oh!

tianyu-l Mar 12, 2026

Uh oh!

tianyu-l Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		@dataclass(kw_only=True, slots=True)
		class SFTTrainerConfig(Trainer.Config):

Conversation

joecummings commented Mar 11, 2026

Uh oh!

tianyu-l Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants