[cache] Remove all deprecated classes#43168
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: ministral, moshi |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43168&sha=18b4ad |
ArthurZucker
left a comment
There was a problem hiding this comment.
let's keep an alias or no?
|
I think we really want to move away from the many many cache classes we had before, so let's not keep any alias! |
|
@stevhliu, we only remove the |
`HybridCache` has been deprecated, and is now removed, see huggingface/transformers#43168, causing ``` trl/trainer/dpo_trainer.py:500: in __init__ super().__init__( .venv/lib/python3.12/site-packages/transformers/trainer.py:479: in __init__ from liger_kernel.transformers import _apply_liger_kernel_to_instance .venv/lib/python3.12/site-packages/liger_kernel/transformers/__init__.py:140: in __getattr__ module = importlib.import_module("liger_kernel.transformers.monkey_patch") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ /__w/_tool/Python/3.12.12/x64/lib/python3.12/importlib/__init__.py:90: in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/liger_kernel/transformers/monkey_patch.py:21: in <module> from liger_kernel.transformers.model.gemma2 import lce_forward as gemma2_lce_forward _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ import logging from typing import Optional from typing import Tuple from typing import Union import torch from torch.nn import CrossEntropyLoss > from transformers.cache_utils import HybridCache E ImportError: cannot import name 'HybridCache' from 'transformers.cache_utils' (/__w/trl/trl/.venv/lib/python3.12/site-packages/transformers/cache_utils.py) .venv/lib/python3.12/site-packages/liger_kernel/transformers/model/gemma2.py:10: ImportError ``` This PR fixes this issue
remove deprecated classes
### Ticket N/A ### Problem description Uplift the transformers library from `4.57.1` to `5.2.0` to broaden model support and enable new models such as GLM-5 to run on our stack. Transformers 5.x is a major version with several breaking changes that required fixes across both tt-xla and tt-forge-models. ### What's changed #### Transformers 5.x breaking changes and how we addressed them **Flax/JAX backend removed (transformers 5.0, [PR #40760](huggingface/transformers#40760 All `FlaxXxx` model classes were removed from the library. As a result: - All JAX tests backed by `FlaxPreTrainedModel` are now marked `NOT_SUPPORTED_SKIP` (82 test entries updated in `test_config_inference_single_device.yaml`). Affected model families: albert, bart, beit, bert/masked_lm, longt5, mt5, t5, regnet, resnet, vit, dinov2, bloom, clip, distilbert, electra, gpt_j, gpt_neo, gpt_sw3, mistral, opt, roberta, roformer, squeezebert, wav2vec2, whisper, xglm, xlm_roberta, marian_mt, mbart50, bigbird, pegasus, vision_text_dual_encoder - Removed `FlaxPreTrainedModel` from the `Model` type alias in `types.py` and from `isinstance` checks and parameter handling in `jax_model_tester.py` and `dynamic_jax_model_tester.py` - Four mamba tensor-parallel test entries removed from `test_config_inference_tensor_parallel.yaml` (Flax mamba model class was removed) - EasyDel-based JAX models (falcon, phi1, phi1_5, phi2, phi3, gpt2, qwen 2.5/coder/3, llama, whisper) remain functional and are pinned to `transformers==4.57.1` via per-model `requirements.txt` in tt-forge-models, since EasyDel itself requires the older transformers API **Legacy cache format removed (transformers 5.0–5.2, [PR #41378](huggingface/transformers#41378), [PR #43168](huggingface/transformers#43168 `to_legacy_cache()`, `from_legacy_cache()`, `get_usable_length()`, and all deprecated `Cache` subclasses were removed. Changes made: - Updated `kimi_k2/modeling_deepseek.py`: replaced `DynamicCache.from_legacy_cache()` with a manual layer-by-layer construction, replaced `to_legacy_cache()` with a manual tuple, and replaced `get_usable_length()` with `get_seq_length()` - Updated `kimi_k2/test_kimi_k2.py`: replaced tuple-indexed shard spec keys (`args[3][0][0]`) with the new layer attribute API (`args[3].layers[0].compressed_kv`), and added `lazy_initialization()` calls for `StaticCache` layers **Unified attention interface (transformers 5.x)** Attention modules no longer return `attn_weights` when using the unified SDPA/flash/eager dispatch path, and require `_attn_implementation` to be set explicitly on the config. Updated Gemma and Mistral attention tests to: - Set `config._attn_implementation = "sdpa"` before constructing attention modules - Drop `attn_weights` from the return value of the inner attention call **`XXXFeatureExtractor` classes removed (transformers 5.0, [PR #41174](huggingface/transformers#41174 All legacy vision `FeatureExtractor` classes were replaced by `ImageProcessor` equivalents. Updated in tt-forge-models: - `detr`: `DetrFeatureExtractor` → `DetrImageProcessor` - `maskformer`: `MaskFormerFeatureExtractor` → `MaskFormerImageProcessor` - `yolos_small`: `YolosFeatureExtractor` → `YolosImageProcessor` **`encode_plus()` / `batch_encode_plus()` removed in favour of `__call__()` (transformers 5.0)** The legacy tokenizer encoding methods were formally removed. Changes made: - tt-forge-models (`huggyllama`, `mistral`, `roberta`): `tokenizer.encode_plus(...)` → `tokenizer(...)` - `examples/pytorch/sdxl-pipeline.py`: `tokenizer.batch_encode_plus(...)` → `tokenizer(...)` - `tests/torch/models/llama3/test_llama_step_n300.py`: `tokenizer.encode_plus(...)` → `tokenizer._encode_plus(...)` (private method still present in 5.x as the internal implementation; should ideally be `tokenizer(...)`) - `tests/torch/quality/image_gen/sdxl/pipeline.py`: replaced the private `tokenizer._encode_plus(...)` call (which broke in 5.x for list inputs with `padding="max_length"`) with the public `tokenizer(...)` interface with explicit `padding="max_length"`, `truncation=True`, and `return_tensors="pt"`. The old code produced mismatched sequence lengths for conditioned vs unconditioned tokens causing a `torch.cat` shape mismatch error. **`trust_remote_code` no longer needed for phi3 (transformers 5.x)** The phi3 model was upstreamed into the official transformers library and `trust_remote_code=True` is now unnecessary. Removed from `AutoTokenizer.from_pretrained`, `AutoConfig.from_pretrained`, and `model_kwargs` in the phi3 loader. **`torch.fx` support dropped (transformers 5.0, [PR #41683](huggingface/transformers#41683 `is_torch_fx_available()`, `is_torch_greater_or_equal_than_1_13`, and all `torch.fx` tracing guards were removed. Updated: - `deepseek_r1` (deepseekv2) loader in tt-forge-models - `kimi_k2/modeling_deepseek.py`: removed `is_torch_fx_available` import and the `_prepare_4d_causal_attention_mask` FX wrap block; replaced `rope_scaling["type"]` dict access with `.get()` to guard against missing keys in newer config formats **VLM sub-module path changed (transformers 5.x, [PR #42156](huggingface/transformers#42156 Vision-language models no longer expose `model.language_model` directly at the top level; it is now accessed via `model.model.language_model`. Updated `mistral/pixtral` loader to add `_get_language_model()` and `_get_vision_tower()` helpers that handle both paths when building shard specs. **`AutoProcessor` with `trust_remote_code` removed for custom processors (transformers 5.x)** `AutoProcessor.from_pretrained(trust_remote_code=True)` no longer works for models with custom processing classes not registered in the transformers auto-mapping. Updated `openvla_oft` to explicitly instantiate `PrismaticImageProcessor` and `PrismaticProcessor` from the local `openvla/pytorch/src/` source. **`tie_weights()` signature changed (transformers 5.x)** `PreTrainedModel.tie_weights()` now passes through `**kwargs`. Updated the `tie_weights` override in `openvla/pytorch/src/modeling_prismatic.py` to accept and forward `**kwargs` to avoid a `TypeError` on model init. **`XLMRobertaSdpaSelfAttention` removed (transformers 5.x)** The separate SDPA attention class was consolidated into the unified attention dispatch. Rewrote `XLMRobertaSelfAttentionWithAdapters` in `sentencizer/pytorch/src/adapter_utils.py` to conform to the new `forward()` signature using `eager_attention_forward` from transformers. **`HfFolder.get_token()` removed (huggingface_hub)** `HfFolder` was removed in recent `huggingface_hub` versions. Updated `sentencizer/pytorch/src/utils.py` to use `HfApi().token` instead. **mamba2 JAX loader removed** `mamba2/causal_lm/jax` was removed as it was non-functional and incompatible with the pinned EasyDel version used by other JAX models. #### tt-xla infrastructure changes - **`transformers` removed from `_JAX_PURGE_SKIP`** (`tests/runner/requirements.py`): `transformers` was previously excluded from the `sys.modules` purge that `RequirementsManager` performs after a per-model pip install. This meant that when an EasyDel model installed `transformers==4.57.1`, the venv's 5.2.0 stayed cached in memory and the newly installed version was never visible to imports. Removing `transformers` from the skip list (keeping only `flax`, which has genuine module-level imports in JAX infra) ensures the installed version is correctly used. All JAX infra files were audited to confirm none hold module-level `transformers` references. - **Sparse MLP router output fix** (`python_package/tt_torch/sparse_mlp.py`): `GptOssTopKRouter` was updated to return a 3-tuple `(router_logits, router_scores, router_indices)` instead of 2. Updated all three MoE dispatch paths (`SparseMLP`, `A2aSparseMLP`, `A2aSparseStackedMlp`) to unpack accordingly and simplified the weighted-sum logic to use the compact scores tensor directly, removing a workaround that used `torch.gather` / one-hot einsum. - **Performance benchmark matrix** (`.github/workflows/perf-bench-matrix.json`): Updated all PyTorch benchmark entries from `transformers==4.57.1` to `transformers==5.2.0`. The `resnet_jax` and `bge_m3_encode` entries are intentionally kept at `transformers==4.57.1` — `FlaxResNetForImageClassification` was removed in 5.x, and `FlagEmbedding` (used by bge_m3) is not yet compatible with 5.x. - **LLM benchmark version check** (`tests/benchmark/benchmarks/llm_benchmark.py`): Updated `check_transformers_version()` to require exactly `5.2.0` instead of `<= 4.57.1`. Also removed the now-unnecessary `check_transformers_version()` guard from `examples/pytorch/llama.py`. - **Resnet codegen examples skipped** (`tests/examples/test_examples.py`): Added XFAIL entries for `jax/codegen/cpp/resnet.py` and `jax/codegen/python/resnet.py` since `FlaxResNetModel` was removed in transformers 5.x. - **`surya-ocr` unpinned** (`venv/requirements-dev.txt`): Removed the `surya-ocr==0.17.0` version pin. #### tt-forge models PR: tenstorrent/tt-forge-models#529 ### CI tests for reference: Manual Release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179435697 Manual Manylinux release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179426382 ### Checklist - [x] Fix `gpt_oss` failure - [x] Fix JAX-only CI workflows --------- Co-authored-by: Vladimir Zeljkovic <vzeljkovic@tenstorrent.com>
What does this PR do?