Skip to content

Refactor trainer init#43807

Merged
SunMarc merged 16 commits intomainfrom
refactor-trainer
Feb 10, 2026
Merged

Refactor trainer init#43807
SunMarc merged 16 commits intomainfrom
refactor-trainer

Conversation

@SunMarc
Copy link
Member

@SunMarc SunMarc commented Feb 6, 2026

What does this PR do?

This PR simplifies Trainer __init__:

  • Quantization validation extracted
  • PEFT unwrapping deduplicated
  • Liger Kernel extracted — apply_liger_kernel
  • Label smoother simplified
  • Validations grouped — _validate_args() method consolidates three scattered validation blocks (arg checks, optimizer checks, dataset checks) into one place
  • and more

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member

/trl-ci

Comment on lines +45 to +53
from liger_kernel.transformers import _apply_liger_kernel_to_instance

kernel_config = args.liger_kernel_config if args.liger_kernel_config is not None else {}
base_model = unwrap_peft_model(model)

if isinstance(base_model, PreTrainedModel):
_apply_liger_kernel_to_instance(model=base_model, **kernel_config)
else:
logger.warning("The model is not an instance of PreTrainedModel. No liger kernels will be applied.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler now

@qgallouedec
Copy link
Member

/trl-ci

PAT is not yet approved for this bot, I ran manually here https://github.com/huggingface/trl/actions/runs/21760373311

Comment on lines 476 to 485
@@ -546,18 +484,20 @@ def __init__(
):
self.place_model_on_device = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern feels a bit off. args.place_model_on_device is a @property on args that calculates based on not is_sagemaker_mp_enabled(). A developer reading this might assume they would have some control over this arg and would want to manually affect this state. I'm pretty sure there are a whole host of edge cases as there are also about 5 or cases above that we currently account for, so making this settable from the TrainingArguments would be a nice to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is a bit off. Let me fix this ! Note that for now, I'm only trying to do simple changes to make it easier to read. But I will start soon cleaning things that feels a bit off in Trainer =)

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I love seeing this method being more compact 🤗. Added some additional thoughts, but can get addressed separate from this PR as well

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a question and suggestions

SunMarc and others added 5 commits February 6, 2026 18:44
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Comment on lines +476 to +479
# Postpone switching model to cuda when MP, DeepSpeed, full bf16/fp16 eval, or FSDP
if args.place_model_on_device is not None:
self.place_model_on_device = args.place_model_on_device
elif (
Copy link
Member Author

@SunMarc SunMarc Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmk if this is better @winglian. The changes should be BC as the user can still overwrite place_model_on_device as a property

@qgallouedec
Copy link
Member

(just a test, now that the PAT is approved)
/trl-ci

@SunMarc SunMarc requested a review from winglian February 9, 2026 16:03
@SunMarc
Copy link
Member Author

SunMarc commented Feb 10, 2026

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Style fix bot fixed some files and pushed the changes.

@SunMarc SunMarc merged commit e12aa2d into main Feb 10, 2026
26 checks passed
@SunMarc SunMarc deleted the refactor-trainer branch February 10, 2026 15:00
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* refactor trainer init

* update init

* simplify liger

* udapte

* better

* comments

* do_train not reliable at all

* This should make more sense now

* Apply suggestion from @qgallouedec

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Apply suggestion from @qgallouedec

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Apply suggestion from @qgallouedec

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Apply suggestion from @qgallouedec

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* move sagemaker mixed precision to validation

* Apply repo consistency fixes

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants