[do not review] Add SFT experiment#2556
Draft
joecummings wants to merge 2 commits intopytorch:mainfrom
Draft
Conversation
Adds an SFT dataloader and config under torchtitan/experiments/sft/ that reuses the existing Trainer without modification. Key features: - Incremental prefix re-tokenization for correct label masking at BPE boundaries (matches torchtune's approach) - Greedy sequence packing with EOS-based document boundaries for flex/varlen attention backends - Config validation for attention backend compatibility and validation hang prevention - Epoch shuffling with deterministic seeds for checkpoint reproducibility - GSM8K dataset config with Qwen3 reasoning trace support Includes 13 unit tests and a 2-GPU integration test.
tianyu-l
reviewed
Mar 12, 2026
Contributor
There was a problem hiding this comment.
does it make sense to put in torchtitan/hf_datasets
|
|
||
|
|
||
| @dataclass(kw_only=True, slots=True) | ||
| class SFTTrainerConfig(Trainer.Config): |
Contributor
There was a problem hiding this comment.
I'm interested in exploring the feasibility to land this in core (outside experiments/), possibly by consolidating the trainer / dataloader and their configs.
…idation When packing multiple documents into one sequence, RoPE positions were not reset per document, causing later documents to receive wrong positional embeddings (positions continued from the previous document instead of restarting at 0). This fixes pytorch#2559. Changes: - Yield per-document position tensors from packed sequences that reset to 0 at each document boundary, flowing through extra_inputs to Decoder.forward(positions=...) - Validate attn_mask_type='block_causal' when pack_sequences=True to prevent cross-document attention leakage - Simplify _tokenize_sample to single-turn with explicit validation - Extract _flush_pack_buffer helper and use slice assignment for masking
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.