🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide by Cyrilvallez · Pull Request #40760 · huggingface/transformers

Cyrilvallez · 2025-09-09T08:32:43Z

What does this PR do?

Apart from obvious tf/jax support, I believe the following should be the only potential breaking changes to torch-only code:

pipelines do not take framework argument anymore
onnx config methods do not take framework argument anymore

It may break current torch code if users do framework="pt" explicitly, but it's a necessary change. It makes no sense to keep those arguments, as the only framework working for those objects is now torch. Would be weird to keep it only for BC, as we are breaking the support anyway.

Note: I did not remove traces of tensorflow/jax in docs .md (markdown) files for now, as this PR is already enormous. It's a very tedious task, and moreover a lot of doc is written in another alphabet that I cannot read at all. Will be done in a subsequent PR, hopefully with the help of AI (should be a perfect fit for that)

HuggingFaceDocBuilderDev · 2025-09-10T22:44:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

what a cleanup!
Be careful about the conversion scripts, we keep the ones that go from original -> torch

setup.py

src/transformers/commands/train.py

src/transformers/configuration_utils.py

src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py

src/transformers/models/align/convert_align_tf_to_hf.py

src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py

src/transformers/pipelines/base.py

ArthurZucker · 2025-09-18T09:41:39Z

src/transformers/pipelines/text_generation.py

@@ -181,7 +174,7 @@ def _sanitize_parameters(
            preprocess_params["prefix"] = prefix
        if prefix:
            prefix_inputs = self.tokenizer(
-                prefix, padding=False, add_special_tokens=add_special_tokens, return_tensors=self.framework
+                prefix, padding=False, add_special_tokens=add_special_tokens, return_tensors="pt"


i understand why you wanted it to return pt by default/ We could also have a "bool" return_tensors=True to return pt tensors

In all the pipelines, self.framework would be set to pt for torch models during init. So I simply removed self.framework and made it explicit!

no no I mean because you have now to manually say "pt" I understand why you told me you wanna default to returning tensors in tokenizer haha

…#40760) * setup * start the purge * continue the purge * more and more * more * continue the quest: remove loading tf/jax checkpoints * style * fix configs * oups forgot conflict * continue * still grinding * always more * in tje zone * never stop * should fix doc * fic * fix * fix * fix tests * still tests * fix non-deterministic * style * remove last rebase issues * onnx configs * still on the grind * always more references * nearly the end * could it really be the end? * small fix * add converters back * post rebase * latest qwen * add back all converters * explicitly add functions in converters * re-add

j6n5nwwmx9-cpu · 2026-01-27T13:47:25Z

L

### Ticket N/A ### Problem description Uplift the transformers library from `4.57.1` to `5.2.0` to broaden model support and enable new models such as GLM-5 to run on our stack. Transformers 5.x is a major version with several breaking changes that required fixes across both tt-xla and tt-forge-models. ### What's changed #### Transformers 5.x breaking changes and how we addressed them **Flax/JAX backend removed (transformers 5.0, [PR #40760](huggingface/transformers#40760 All `FlaxXxx` model classes were removed from the library. As a result: - All JAX tests backed by `FlaxPreTrainedModel` are now marked `NOT_SUPPORTED_SKIP` (82 test entries updated in `test_config_inference_single_device.yaml`). Affected model families: albert, bart, beit, bert/masked_lm, longt5, mt5, t5, regnet, resnet, vit, dinov2, bloom, clip, distilbert, electra, gpt_j, gpt_neo, gpt_sw3, mistral, opt, roberta, roformer, squeezebert, wav2vec2, whisper, xglm, xlm_roberta, marian_mt, mbart50, bigbird, pegasus, vision_text_dual_encoder - Removed `FlaxPreTrainedModel` from the `Model` type alias in `types.py` and from `isinstance` checks and parameter handling in `jax_model_tester.py` and `dynamic_jax_model_tester.py` - Four mamba tensor-parallel test entries removed from `test_config_inference_tensor_parallel.yaml` (Flax mamba model class was removed) - EasyDel-based JAX models (falcon, phi1, phi1_5, phi2, phi3, gpt2, qwen 2.5/coder/3, llama, whisper) remain functional and are pinned to `transformers==4.57.1` via per-model `requirements.txt` in tt-forge-models, since EasyDel itself requires the older transformers API **Legacy cache format removed (transformers 5.0–5.2, [PR #41378](huggingface/transformers#41378), [PR #43168](huggingface/transformers#43168 `to_legacy_cache()`, `from_legacy_cache()`, `get_usable_length()`, and all deprecated `Cache` subclasses were removed. Changes made: - Updated `kimi_k2/modeling_deepseek.py`: replaced `DynamicCache.from_legacy_cache()` with a manual layer-by-layer construction, replaced `to_legacy_cache()` with a manual tuple, and replaced `get_usable_length()` with `get_seq_length()` - Updated `kimi_k2/test_kimi_k2.py`: replaced tuple-indexed shard spec keys (`args[3][0][0]`) with the new layer attribute API (`args[3].layers[0].compressed_kv`), and added `lazy_initialization()` calls for `StaticCache` layers **Unified attention interface (transformers 5.x)** Attention modules no longer return `attn_weights` when using the unified SDPA/flash/eager dispatch path, and require `_attn_implementation` to be set explicitly on the config. Updated Gemma and Mistral attention tests to: - Set `config._attn_implementation = "sdpa"` before constructing attention modules - Drop `attn_weights` from the return value of the inner attention call **`XXXFeatureExtractor` classes removed (transformers 5.0, [PR #41174](huggingface/transformers#41174 All legacy vision `FeatureExtractor` classes were replaced by `ImageProcessor` equivalents. Updated in tt-forge-models: - `detr`: `DetrFeatureExtractor` → `DetrImageProcessor` - `maskformer`: `MaskFormerFeatureExtractor` → `MaskFormerImageProcessor` - `yolos_small`: `YolosFeatureExtractor` → `YolosImageProcessor` **`encode_plus()` / `batch_encode_plus()` removed in favour of `__call__()` (transformers 5.0)** The legacy tokenizer encoding methods were formally removed. Changes made: - tt-forge-models (`huggyllama`, `mistral`, `roberta`): `tokenizer.encode_plus(...)` → `tokenizer(...)` - `examples/pytorch/sdxl-pipeline.py`: `tokenizer.batch_encode_plus(...)` → `tokenizer(...)` - `tests/torch/models/llama3/test_llama_step_n300.py`: `tokenizer.encode_plus(...)` → `tokenizer._encode_plus(...)` (private method still present in 5.x as the internal implementation; should ideally be `tokenizer(...)`) - `tests/torch/quality/image_gen/sdxl/pipeline.py`: replaced the private `tokenizer._encode_plus(...)` call (which broke in 5.x for list inputs with `padding="max_length"`) with the public `tokenizer(...)` interface with explicit `padding="max_length"`, `truncation=True`, and `return_tensors="pt"`. The old code produced mismatched sequence lengths for conditioned vs unconditioned tokens causing a `torch.cat` shape mismatch error. **`trust_remote_code` no longer needed for phi3 (transformers 5.x)** The phi3 model was upstreamed into the official transformers library and `trust_remote_code=True` is now unnecessary. Removed from `AutoTokenizer.from_pretrained`, `AutoConfig.from_pretrained`, and `model_kwargs` in the phi3 loader. **`torch.fx` support dropped (transformers 5.0, [PR #41683](huggingface/transformers#41683 `is_torch_fx_available()`, `is_torch_greater_or_equal_than_1_13`, and all `torch.fx` tracing guards were removed. Updated: - `deepseek_r1` (deepseekv2) loader in tt-forge-models - `kimi_k2/modeling_deepseek.py`: removed `is_torch_fx_available` import and the `_prepare_4d_causal_attention_mask` FX wrap block; replaced `rope_scaling["type"]` dict access with `.get()` to guard against missing keys in newer config formats **VLM sub-module path changed (transformers 5.x, [PR #42156](huggingface/transformers#42156 Vision-language models no longer expose `model.language_model` directly at the top level; it is now accessed via `model.model.language_model`. Updated `mistral/pixtral` loader to add `_get_language_model()` and `_get_vision_tower()` helpers that handle both paths when building shard specs. **`AutoProcessor` with `trust_remote_code` removed for custom processors (transformers 5.x)** `AutoProcessor.from_pretrained(trust_remote_code=True)` no longer works for models with custom processing classes not registered in the transformers auto-mapping. Updated `openvla_oft` to explicitly instantiate `PrismaticImageProcessor` and `PrismaticProcessor` from the local `openvla/pytorch/src/` source. **`tie_weights()` signature changed (transformers 5.x)** `PreTrainedModel.tie_weights()` now passes through `**kwargs`. Updated the `tie_weights` override in `openvla/pytorch/src/modeling_prismatic.py` to accept and forward `**kwargs` to avoid a `TypeError` on model init. **`XLMRobertaSdpaSelfAttention` removed (transformers 5.x)** The separate SDPA attention class was consolidated into the unified attention dispatch. Rewrote `XLMRobertaSelfAttentionWithAdapters` in `sentencizer/pytorch/src/adapter_utils.py` to conform to the new `forward()` signature using `eager_attention_forward` from transformers. **`HfFolder.get_token()` removed (huggingface_hub)** `HfFolder` was removed in recent `huggingface_hub` versions. Updated `sentencizer/pytorch/src/utils.py` to use `HfApi().token` instead. **mamba2 JAX loader removed** `mamba2/causal_lm/jax` was removed as it was non-functional and incompatible with the pinned EasyDel version used by other JAX models. #### tt-xla infrastructure changes - **`transformers` removed from `_JAX_PURGE_SKIP`** (`tests/runner/requirements.py`): `transformers` was previously excluded from the `sys.modules` purge that `RequirementsManager` performs after a per-model pip install. This meant that when an EasyDel model installed `transformers==4.57.1`, the venv's 5.2.0 stayed cached in memory and the newly installed version was never visible to imports. Removing `transformers` from the skip list (keeping only `flax`, which has genuine module-level imports in JAX infra) ensures the installed version is correctly used. All JAX infra files were audited to confirm none hold module-level `transformers` references. - **Sparse MLP router output fix** (`python_package/tt_torch/sparse_mlp.py`): `GptOssTopKRouter` was updated to return a 3-tuple `(router_logits, router_scores, router_indices)` instead of 2. Updated all three MoE dispatch paths (`SparseMLP`, `A2aSparseMLP`, `A2aSparseStackedMlp`) to unpack accordingly and simplified the weighted-sum logic to use the compact scores tensor directly, removing a workaround that used `torch.gather` / one-hot einsum. - **Performance benchmark matrix** (`.github/workflows/perf-bench-matrix.json`): Updated all PyTorch benchmark entries from `transformers==4.57.1` to `transformers==5.2.0`. The `resnet_jax` and `bge_m3_encode` entries are intentionally kept at `transformers==4.57.1` — `FlaxResNetForImageClassification` was removed in 5.x, and `FlagEmbedding` (used by bge_m3) is not yet compatible with 5.x. - **LLM benchmark version check** (`tests/benchmark/benchmarks/llm_benchmark.py`): Updated `check_transformers_version()` to require exactly `5.2.0` instead of `<= 4.57.1`. Also removed the now-unnecessary `check_transformers_version()` guard from `examples/pytorch/llama.py`. - **Resnet codegen examples skipped** (`tests/examples/test_examples.py`): Added XFAIL entries for `jax/codegen/cpp/resnet.py` and `jax/codegen/python/resnet.py` since `FlaxResNetModel` was removed in transformers 5.x. - **`surya-ocr` unpinned** (`venv/requirements-dev.txt`): Removed the `surya-ocr==0.17.0` version pin. #### tt-forge models PR: tenstorrent/tt-forge-models#529 ### CI tests for reference: Manual Release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179435697 Manual Manylinux release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179426382 ### Checklist - [x] Fix `gpt_oss` failure - [x] Fix JAX-only CI workflows --------- Co-authored-by: Vladimir Zeljkovic <vzeljkovic@tenstorrent.com>

Cyrilvallez force-pushed the the-great-cleaning branch from 475c567 to 9190046 Compare September 10, 2025 13:28

This was referenced Sep 11, 2025

Fix typing #40788

Merged

RUFF fix on CI scripts #40805

Merged

Cyrilvallez force-pushed the the-great-cleaning branch from f71cedb to 1ecdc29 Compare September 11, 2025 13:06

Rocketknight1 mentioned this pull request Sep 12, 2025

Simplify unnecessary Optional typing #40839

Merged

Cyrilvallez force-pushed the the-great-cleaning branch from 8706485 to b00b07e Compare September 15, 2025 10:53

This was referenced Sep 15, 2025

Fix typoes in src and tests #40845

Merged

Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling #39485

Merged

Cyrilvallez force-pushed the the-great-cleaning branch from 7120b37 to 4069e64 Compare September 17, 2025 09:11

Cyrilvallez changed the title ~~Fully remove Tensorflow and Jax support library-wide~~ 🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide Sep 17, 2025

This was referenced Sep 17, 2025

Fix outdated torch version check #40925

Merged

Remove unused arguments #40916

Merged

Remove repeated import #40937

Merged

ArthurZucker approved these changes Sep 18, 2025

View reviewed changes

ArthurZucker added the for_v5? label Sep 18, 2025

Cyrilvallez added 14 commits September 18, 2025 15:29

setup

4db6194

start the purge

2b4bf6c

continue the purge

176fed4

more and more

5f3bc50

more

e58825c

continue the quest: remove loading tf/jax checkpoints

2d7b5af

style

5c92286

fix configs

0354420

oups forgot conflict

ca16569

continue

896e965

still grinding

f7239b8

always more

b68ff88

in tje zone

14daddd

never stop

e970b64

Cyrilvallez added 12 commits September 18, 2025 15:30

fix

d58b679

fix tests

15e9e8d

still tests

de27613

fix non-deterministic

6e434e0

style

dc65cae

remove last rebase issues

6c231c6

onnx configs

3bf3a97

still on the grind

0e9fd50

always more references

ad8cfec

nearly the end

72b2a28

could it really be the end?

7c7d176

small fix

9a41b8d

Cyrilvallez force-pushed the the-great-cleaning branch from 4069e64 to 9a41b8d Compare September 18, 2025 13:31

Cyrilvallez added 6 commits September 18, 2025 15:38

add converters back

a75f707

post rebase

0b00857

latest qwen

d6692aa

add back all converters

ae33acf

explicitly add functions in converters

9b801ec

re-add

df215f9

Cyrilvallez merged commit 4df2529 into main Sep 18, 2025
21 of 24 checks passed

Cyrilvallez deleted the the-great-cleaning branch September 18, 2025 16:27

albertvillanova mentioned this pull request Oct 2, 2025

Remove custome_container for building the docs huggingface/trl#4198

Merged

LysandreJik mentioned this pull request Oct 9, 2025

Welcome v5 #40822

Closed

molbap mentioned this pull request Oct 9, 2025

[WIP] Add DINO DETR Model to HuggingFace Transformers #36711

Open

ssaliceTT mentioned this pull request Feb 26, 2026

Transformers v5.2.0 Uplift tenstorrent/tt-xla#3371

Merged

2 tasks

lk-chen mentioned this pull request Mar 5, 2026

[Cleanup] cleanup from transformers import modeling_flax_utils vllm-project/tpu-inference#1873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide#40760

🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide#40760
Cyrilvallez merged 35 commits intomainfrom
the-great-cleaning

Cyrilvallez commented Sep 9, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 10, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Sep 18, 2025

Uh oh!

Cyrilvallez Sep 18, 2025

Uh oh!

ArthurZucker Sep 18, 2025

Uh oh!

Uh oh!

j6n5nwwmx9-cpu commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 10, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

j6n5nwwmx9-cpu commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Sep 9, 2025 •

edited

Loading