Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,8 @@
title: DeBERTa
- local: model_doc/deberta-v2
title: DeBERTa-v2
- local: model_doc/deepseek_v2
title: DeepSeek-V2
- local: model_doc/deepseek_v3
title: DeepSeek-V3
- local: model_doc/dialogpt
Expand Down Expand Up @@ -761,12 +763,6 @@
title: D-FINE
- local: model_doc/dab-detr
title: DAB-DETR
- local: model_doc/deepseek_v2
title: DeepSeek-V2
- local: model_doc/deepseek_vl
title: DeepseekVL
- local: model_doc/deepseek_vl_hybrid
title: DeepseekVLHybrid
- local: model_doc/deformable_detr
title: Deformable DETR
- local: model_doc/deit
Expand Down Expand Up @@ -849,10 +845,16 @@
title: RT-DETR
- local: model_doc/rt_detr_v2
title: RT-DETRv2
- local: model_doc/sam2
title: SAM2
- local: model_doc/segformer
title: SegFormer
- local: model_doc/seggpt
title: SegGpt
- local: model_doc/sam
title: Segment Anything
- local: model_doc/sam_hq
title: Segment Anything High Quality
- local: model_doc/superglue
title: SuperGlue
- local: model_doc/superpoint
Expand Down Expand Up @@ -975,6 +977,8 @@
title: XLSR-Wav2Vec2
title: Audio models
- sections:
- local: model_doc/sam2_video
title: SAM2 Video
- local: model_doc/timesformer
title: TimeSformer
- local: model_doc/vjepa2
Expand Down Expand Up @@ -1019,6 +1023,10 @@
title: ColQwen2
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deepseek_vl
title: DeepseekVL
- local: model_doc/deepseek_vl_hybrid
title: DeepseekVLHybrid
- local: model_doc/deplot
title: DePlot
- local: model_doc/donut
Expand Down Expand Up @@ -1137,14 +1145,6 @@
title: Qwen3VL
- local: model_doc/qwen3_vl_moe
title: Qwen3VLMoe
- local: model_doc/sam2
title: SAM2
- local: model_doc/sam2_video
title: SAM2 Video
- local: model_doc/sam
title: Segment Anything
- local: model_doc/sam_hq
title: Segment Anything High Quality
- local: model_doc/shieldgemma2
title: ShieldGemma2
- local: model_doc/siglip
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/bert-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on 2019-07-29 and added to Hugging Face Transformers on 2020-11-16.*

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/flex_olmo.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.

-->
*This model was released on 2025-07-09 and added to Hugging Face Transformers on 2025-09-15.*
*This model was released on 2025-07-09 and added to Hugging Face Transformers on 2025-09-18.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/hunyuan_v1_dense.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-08-22.*

# HunYuanDenseV1

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/hunyuan_v1_moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-08-22.*

# HunYuanMoEV1

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/lfm2_vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-18.*

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
Expand Down
5 changes: 2 additions & 3 deletions docs/source/en/model_doc/longcat_flash.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.

-->
*This model was released on 2025-09-01 and added to Hugging Face Transformers on 2025-09-15.*

*This model was released on 2025-09-01 and added to Hugging Face Transformers on 2025-09-17.*

# LongCatFlash

Expand Down Expand Up @@ -70,7 +69,7 @@ outputs = model.generate(inputs, max_new_tokens=30)
print(tokenizer.batch_decode(outputs))
```

To run with TP, you will need torchrun:
To run with TP, you will need torchrun:

```bash
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0 | 1 --rdzv-id <an_id> --rdzv-backend c10d --rdzv-endpoint $NODE_ID:$NODE_PORT --log-dir ./logs_longcat launch_longcat.py
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/ministral.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-11.*

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
Expand Down
11 changes: 6 additions & 5 deletions docs/source/en/model_doc/olmo3.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-08.*
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-16.*

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
Expand Down Expand Up @@ -46,7 +47,7 @@ pipe = pipeline(
dtype=torch.bfloat16,
device=0,
)

result = pipe("Plants create energy through a process known as")
print(result)
```
Expand Down Expand Up @@ -119,11 +120,11 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))

## Notes

- Load specific intermediate checkpoints by adding the `revision` parameter to [`~PreTrainedModel.from_pretrained`].
- Load specific intermediate checkpoints by adding the `revision` parameter to [`~PreTrainedModel.from_pretrained`].

```py
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("allenai/TBA", revision="stage1-step140000-tokens294B")
```

Expand All @@ -144,4 +145,4 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
## Olmo3PreTrainedModel

[[autodoc]] Olmo3PreTrainedModel
- forward
- forward
1 change: 1 addition & 0 deletions docs/source/en/model_doc/ovis2.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on 2024-05-31 and added to Hugging Face Transformers on 2025-08-18.*

# Ovis2

Expand Down
12 changes: 7 additions & 5 deletions docs/source/en/model_doc/qwen3_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,20 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-10.*

## Overview

The Qwen3-Next series represents our next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency.
The Qwen3-Next series represents our next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency.
The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:
- **Hybrid Attention**: Replaces standard attention with the combination of **Gated DeltaNet** and **Gated Attention**, enabling efficient context modeling.
- **Hybrid Attention**: Replaces standard attention with the combination of **Gated DeltaNet** and **Gated Attention**, enabling efficient context modeling.
- **High-Sparsity MoE**: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.
- **Multi-Token Prediction(MTP)**: Boosts pretraining model performance, and accelerates inference.
- **Other Optimizations**: Includes techniques such as **zero-centered and weight-decayed layernorm**, **Gated Attention**, and other stabilizing enhancements for robust training.
- **Other Optimizations**: Includes techniques such as **zero-centered and weight-decayed layernorm**, **Gated Attention**, and other stabilizing enhancements for robust training.

Built on this architecture, we trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.

Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring **less than 1/10 of the training cost**.
Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring **less than 1/10 of the training cost**.
Moreover, it delivers over **10x higher inference throughput** than Qwen3-32B when handling contexts longer than 32K tokens.

For more details, please visit our blog [Qwen3-Next](qwen3_next) ([blog post](https://qwenlm.github.io/blog/qwen3_next/)).
Expand Down Expand Up @@ -60,7 +62,7 @@ generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/qwen3_vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on None and added to Hugging Face Transformers on 2025-08-16.*
*This model was released on None and added to Hugging Face Transformers on 2025-09-15.*

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/qwen3_vl_moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.

-->
*This model was released on None and added to Hugging Face Transformers on 2025-08-17.*
*This model was released on None and added to Hugging Face Transformers on 2025-09-15.*

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
Expand Down
33 changes: 18 additions & 15 deletions docs/source/en/model_doc/seed_oss.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
<!--
# Copyright 2025 Bytedance-Seed Ltd and the HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
<!--
Copyright 2025 Bytedance-Seed Ltd and the HuggingFace Inc. team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-08-22.*

# SeedOss

Expand Down Expand Up @@ -54,4 +57,4 @@ To be released with the official model launch.
## SeedOssForQuestionAnswering

[[autodoc]] SeedOssForQuestionAnswering
- forward
- forward
3 changes: 2 additions & 1 deletion docs/source/en/model_doc/vaultgemma.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-09-12.*

# VaultGemma

Expand All @@ -30,7 +31,7 @@ sequence length.
VaultGemma was trained from scratch with sequence-level differential privacy (DP). Its training data includes the same
mixture as the [Gemma 2 models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315),
consisting of a number of documents of varying lengths. Additionally, it is trained using
[DP stochastic gradient descent (DP-SGD)](https://arxiv.org/abs/1607.00133) and provides a
[DP stochastic gradient descent (DP-SGD)](https://huggingface.co/papers/1607.00133) and provides a
(ε ≤ 2.0, δ ≤ 1.1e-10)-sequence-level DP guarantee, where a sequence consists of 1024 consecutive tokens extracted from
heterogeneous data sources. Specifically, the privacy unit of the guarantee is for the sequences after sampling and
packing of the mixture.
Expand Down