[cache] Remove all deprecated classes by Cyrilvallez · Pull Request #43168 · huggingface/transformers

Cyrilvallez · 2026-01-08T11:40:42Z

What does this PR do?

github-actions · 2026-01-08T11:41:48Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ministral, moshi

HuggingFaceDocBuilderDev · 2026-01-08T11:52:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-08T11:55:28Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43168&sha=18b4ad

ArthurZucker

let's keep an alias or no?

stevhliu · 2026-01-08T20:09:13Z

i think we'll also need to update the docs here and here which mention the OffloadedCache?

Cyrilvallez · 2026-01-09T09:44:13Z

I think we really want to move away from the many many cache classes we had before, so let's not keep any alias!

Cyrilvallez · 2026-01-09T09:47:24Z

@stevhliu, we only remove the OffloadedCache class, but cache_implementation="offloaded" or cache_implementation="offloaded_static" are still valid! It's just the classes that are removed, but the logic for those stay the same!

`HybridCache` has been deprecated, and is now removed, see huggingface/transformers#43168, causing ``` trl/trainer/dpo_trainer.py:500: in __init__ super().__init__( .venv/lib/python3.12/site-packages/transformers/trainer.py:479: in __init__ from liger_kernel.transformers import _apply_liger_kernel_to_instance .venv/lib/python3.12/site-packages/liger_kernel/transformers/__init__.py:140: in __getattr__ module = importlib.import_module("liger_kernel.transformers.monkey_patch") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ /__w/_tool/Python/3.12.12/x64/lib/python3.12/importlib/__init__.py:90: in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/liger_kernel/transformers/monkey_patch.py:21: in <module> from liger_kernel.transformers.model.gemma2 import lce_forward as gemma2_lce_forward _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ import logging from typing import Optional from typing import Tuple from typing import Union import torch from torch.nn import CrossEntropyLoss > from transformers.cache_utils import HybridCache E ImportError: cannot import name 'HybridCache' from 'transformers.cache_utils' (/__w/trl/trl/.venv/lib/python3.12/site-packages/transformers/cache_utils.py) .venv/lib/python3.12/site-packages/liger_kernel/transformers/model/gemma2.py:10: ImportError ``` This PR fixes this issue

remove deprecated classes

### Ticket N/A ### Problem description Uplift the transformers library from `4.57.1` to `5.2.0` to broaden model support and enable new models such as GLM-5 to run on our stack. Transformers 5.x is a major version with several breaking changes that required fixes across both tt-xla and tt-forge-models. ### What's changed #### Transformers 5.x breaking changes and how we addressed them **Flax/JAX backend removed (transformers 5.0, [PR #40760](huggingface/transformers#40760 All `FlaxXxx` model classes were removed from the library. As a result: - All JAX tests backed by `FlaxPreTrainedModel` are now marked `NOT_SUPPORTED_SKIP` (82 test entries updated in `test_config_inference_single_device.yaml`). Affected model families: albert, bart, beit, bert/masked_lm, longt5, mt5, t5, regnet, resnet, vit, dinov2, bloom, clip, distilbert, electra, gpt_j, gpt_neo, gpt_sw3, mistral, opt, roberta, roformer, squeezebert, wav2vec2, whisper, xglm, xlm_roberta, marian_mt, mbart50, bigbird, pegasus, vision_text_dual_encoder - Removed `FlaxPreTrainedModel` from the `Model` type alias in `types.py` and from `isinstance` checks and parameter handling in `jax_model_tester.py` and `dynamic_jax_model_tester.py` - Four mamba tensor-parallel test entries removed from `test_config_inference_tensor_parallel.yaml` (Flax mamba model class was removed) - EasyDel-based JAX models (falcon, phi1, phi1_5, phi2, phi3, gpt2, qwen 2.5/coder/3, llama, whisper) remain functional and are pinned to `transformers==4.57.1` via per-model `requirements.txt` in tt-forge-models, since EasyDel itself requires the older transformers API **Legacy cache format removed (transformers 5.0–5.2, [PR #41378](huggingface/transformers#41378), [PR #43168](huggingface/transformers#43168 `to_legacy_cache()`, `from_legacy_cache()`, `get_usable_length()`, and all deprecated `Cache` subclasses were removed. Changes made: - Updated `kimi_k2/modeling_deepseek.py`: replaced `DynamicCache.from_legacy_cache()` with a manual layer-by-layer construction, replaced `to_legacy_cache()` with a manual tuple, and replaced `get_usable_length()` with `get_seq_length()` - Updated `kimi_k2/test_kimi_k2.py`: replaced tuple-indexed shard spec keys (`args[3][0][0]`) with the new layer attribute API (`args[3].layers[0].compressed_kv`), and added `lazy_initialization()` calls for `StaticCache` layers **Unified attention interface (transformers 5.x)** Attention modules no longer return `attn_weights` when using the unified SDPA/flash/eager dispatch path, and require `_attn_implementation` to be set explicitly on the config. Updated Gemma and Mistral attention tests to: - Set `config._attn_implementation = "sdpa"` before constructing attention modules - Drop `attn_weights` from the return value of the inner attention call **`XXXFeatureExtractor` classes removed (transformers 5.0, [PR #41174](huggingface/transformers#41174 All legacy vision `FeatureExtractor` classes were replaced by `ImageProcessor` equivalents. Updated in tt-forge-models: - `detr`: `DetrFeatureExtractor` → `DetrImageProcessor` - `maskformer`: `MaskFormerFeatureExtractor` → `MaskFormerImageProcessor` - `yolos_small`: `YolosFeatureExtractor` → `YolosImageProcessor` **`encode_plus()` / `batch_encode_plus()` removed in favour of `__call__()` (transformers 5.0)** The legacy tokenizer encoding methods were formally removed. Changes made: - tt-forge-models (`huggyllama`, `mistral`, `roberta`): `tokenizer.encode_plus(...)` → `tokenizer(...)` - `examples/pytorch/sdxl-pipeline.py`: `tokenizer.batch_encode_plus(...)` → `tokenizer(...)` - `tests/torch/models/llama3/test_llama_step_n300.py`: `tokenizer.encode_plus(...)` → `tokenizer._encode_plus(...)` (private method still present in 5.x as the internal implementation; should ideally be `tokenizer(...)`) - `tests/torch/quality/image_gen/sdxl/pipeline.py`: replaced the private `tokenizer._encode_plus(...)` call (which broke in 5.x for list inputs with `padding="max_length"`) with the public `tokenizer(...)` interface with explicit `padding="max_length"`, `truncation=True`, and `return_tensors="pt"`. The old code produced mismatched sequence lengths for conditioned vs unconditioned tokens causing a `torch.cat` shape mismatch error. **`trust_remote_code` no longer needed for phi3 (transformers 5.x)** The phi3 model was upstreamed into the official transformers library and `trust_remote_code=True` is now unnecessary. Removed from `AutoTokenizer.from_pretrained`, `AutoConfig.from_pretrained`, and `model_kwargs` in the phi3 loader. **`torch.fx` support dropped (transformers 5.0, [PR #41683](huggingface/transformers#41683 `is_torch_fx_available()`, `is_torch_greater_or_equal_than_1_13`, and all `torch.fx` tracing guards were removed. Updated: - `deepseek_r1` (deepseekv2) loader in tt-forge-models - `kimi_k2/modeling_deepseek.py`: removed `is_torch_fx_available` import and the `_prepare_4d_causal_attention_mask` FX wrap block; replaced `rope_scaling["type"]` dict access with `.get()` to guard against missing keys in newer config formats **VLM sub-module path changed (transformers 5.x, [PR #42156](huggingface/transformers#42156 Vision-language models no longer expose `model.language_model` directly at the top level; it is now accessed via `model.model.language_model`. Updated `mistral/pixtral` loader to add `_get_language_model()` and `_get_vision_tower()` helpers that handle both paths when building shard specs. **`AutoProcessor` with `trust_remote_code` removed for custom processors (transformers 5.x)** `AutoProcessor.from_pretrained(trust_remote_code=True)` no longer works for models with custom processing classes not registered in the transformers auto-mapping. Updated `openvla_oft` to explicitly instantiate `PrismaticImageProcessor` and `PrismaticProcessor` from the local `openvla/pytorch/src/` source. **`tie_weights()` signature changed (transformers 5.x)** `PreTrainedModel.tie_weights()` now passes through `**kwargs`. Updated the `tie_weights` override in `openvla/pytorch/src/modeling_prismatic.py` to accept and forward `**kwargs` to avoid a `TypeError` on model init. **`XLMRobertaSdpaSelfAttention` removed (transformers 5.x)** The separate SDPA attention class was consolidated into the unified attention dispatch. Rewrote `XLMRobertaSelfAttentionWithAdapters` in `sentencizer/pytorch/src/adapter_utils.py` to conform to the new `forward()` signature using `eager_attention_forward` from transformers. **`HfFolder.get_token()` removed (huggingface_hub)** `HfFolder` was removed in recent `huggingface_hub` versions. Updated `sentencizer/pytorch/src/utils.py` to use `HfApi().token` instead. **mamba2 JAX loader removed** `mamba2/causal_lm/jax` was removed as it was non-functional and incompatible with the pinned EasyDel version used by other JAX models. #### tt-xla infrastructure changes - **`transformers` removed from `_JAX_PURGE_SKIP`** (`tests/runner/requirements.py`): `transformers` was previously excluded from the `sys.modules` purge that `RequirementsManager` performs after a per-model pip install. This meant that when an EasyDel model installed `transformers==4.57.1`, the venv's 5.2.0 stayed cached in memory and the newly installed version was never visible to imports. Removing `transformers` from the skip list (keeping only `flax`, which has genuine module-level imports in JAX infra) ensures the installed version is correctly used. All JAX infra files were audited to confirm none hold module-level `transformers` references. - **Sparse MLP router output fix** (`python_package/tt_torch/sparse_mlp.py`): `GptOssTopKRouter` was updated to return a 3-tuple `(router_logits, router_scores, router_indices)` instead of 2. Updated all three MoE dispatch paths (`SparseMLP`, `A2aSparseMLP`, `A2aSparseStackedMlp`) to unpack accordingly and simplified the weighted-sum logic to use the compact scores tensor directly, removing a workaround that used `torch.gather` / one-hot einsum. - **Performance benchmark matrix** (`.github/workflows/perf-bench-matrix.json`): Updated all PyTorch benchmark entries from `transformers==4.57.1` to `transformers==5.2.0`. The `resnet_jax` and `bge_m3_encode` entries are intentionally kept at `transformers==4.57.1` — `FlaxResNetForImageClassification` was removed in 5.x, and `FlagEmbedding` (used by bge_m3) is not yet compatible with 5.x. - **LLM benchmark version check** (`tests/benchmark/benchmarks/llm_benchmark.py`): Updated `check_transformers_version()` to require exactly `5.2.0` instead of `<= 4.57.1`. Also removed the now-unnecessary `check_transformers_version()` guard from `examples/pytorch/llama.py`. - **Resnet codegen examples skipped** (`tests/examples/test_examples.py`): Added XFAIL entries for `jax/codegen/cpp/resnet.py` and `jax/codegen/python/resnet.py` since `FlaxResNetModel` was removed in transformers 5.x. - **`surya-ocr` unpinned** (`venv/requirements-dev.txt`): Removed the `surya-ocr==0.17.0` version pin. #### tt-forge models PR: tenstorrent/tt-forge-models#529 ### CI tests for reference: Manual Release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179435697 Manual Manylinux release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179426382 ### Checklist - [x] Fix `gpt_oss` failure - [x] Fix JAX-only CI workflows --------- Co-authored-by: Vladimir Zeljkovic <vzeljkovic@tenstorrent.com>

remove deprecated classes

d793505

Merge branch 'main' into remove-cache

18b4ad7

ArthurZucker approved these changes Jan 8, 2026

View reviewed changes

Cyrilvallez merged commit cb39692 into main Jan 9, 2026
24 of 26 checks passed

Cyrilvallez deleted the remove-cache branch January 9, 2026 09:47

qgallouedec mentioned this pull request Jan 9, 2026

fix: replace HybridCache with Cache in gemma2 and gemma3 linkedin/Liger-Kernel#1002

Merged

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026

[cache] Remove all deprecated classes (huggingface#43168)

c4827c3

remove deprecated classes

ssaliceTT mentioned this pull request Feb 26, 2026

Transformers v5.2.0 Uplift tenstorrent/tt-xla#3371

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cache] Remove all deprecated classes#43168

[cache] Remove all deprecated classes#43168
Cyrilvallez merged 2 commits intomainfrom
remove-cache

Cyrilvallez commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

stevhliu commented Jan 8, 2026

Uh oh!

Cyrilvallez commented Jan 9, 2026

Uh oh!

Cyrilvallez commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented Jan 8, 2026

What does this PR do?

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu commented Jan 8, 2026

Uh oh!

Cyrilvallez commented Jan 9, 2026

Uh oh!

Cyrilvallez commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants