[model] support Qwen3.5 all series models by isLinXu · Pull Request #10237 · hiyouga/LlamaFactory

isLinXu · 2026-03-02T14:53:00Z

What does this PR do?

Add support for the full Qwen3.5 model series (Base and Thinking variants) in src/llamafactory/extras/constants.py.

Fixes # (issue)

Summary

This PR registers the complete Qwen3.5 model family, which was released by the Qwen Team on 2026-02-15. Qwen3.5 is a native multimodal model series built on a hybrid architecture (Gated DeltaNet linear attention + Gated Attention + Sparse MoE), supporting 262K context natively and 201 languages.

Models Added

🚀 Qwen3.5 Series: The Complete Model Matrix

1. Dense Models

The Dense series provides consistent performance across general-purpose tasks and is the foundation for standard fine-tuning.

Model Name	Parameters	Type	Positioning & Use Case	Base Released	Thinking Model (HF Path)
Qwen3.5-0.8B	0.9B	Dense	Ultra-lightweight: Edge devices & mobile SDKs.	✅	`Qwen/Qwen3.5-0.8B`
Qwen3.5-2B	2.1B	Dense	Lightweight: Mobile apps, IoT, & low-latency chat.	✅	`Qwen/Qwen3.5-2B`
Qwen3.5-4B	5.2B	Dense	Multimodal Ready: Ideal for lightweight Agents.	✅	`Qwen/Qwen3.5-4B`
Qwen3.5-9B	10.2B	Dense	Efficiency King: Best ROI for developers/local use.	✅	`Qwen/Qwen3.5-9B`
Qwen3.5-27B	28B	Dense	The Solid Performer: Balanced power for production.	❌	`Qwen/Qwen3.5-27B`

2. Sparse MoE Models (Mixture of Experts)

These models utilize a "High Capacity, Low Activation" architecture, offering flagship-level intelligence with significantly faster inference speeds.

Model Name	Total / Active	Type	Positioning & Use Case	Base Released	Thinking Model (HF Path)
Qwen3.5-35B-A3B	36B / 3B	MoE	Pocket Rocket: 3B-tier speed with 36B-tier logic.	✅	`Qwen/Qwen3.5-35B-A3B`
Qwen3.5-122B-A10B	125B / 10B	MoE	Mid-tier Pro: The workhorse for high-end Agents.	❌	`Qwen/Qwen3.5-122B-A10B`
Qwen3.5-397B-A17B	403B / 17B	MoE	Flagship Behemoth: SOTA reasoning & long context.	❌	`Qwen/Qwen3.5-397B-A17B`

💡 Key Takeaways for Developers

The "Thinking" Evolution: Unlike previous versions, the post-trained models are branded as Thinking versions. These are specifically optimized for chain-of-thought reasoning, mathematics, and complex coding tasks.
Base vs. Thinking Availability:
- Models like 9B and 35B-A3B are "Full-stack," meaning both Base and Thinking versions are available for deep fine-tuning.
- Large-scale models (27B, 122B, 397B) are currently only available in Thinking versions, making them perfect for out-of-the-box deployment or instruction-based fine-tuning.
Deployment Strategy:
- If you have limited VRAM but need high intelligence, the 35B-A3B (MoE) is your best bet as it only "feels" like a 3B model during inference.
- For RAG (Retrieval-Augmented Generation) applications, the 9B Dense model remains the most cost-effective choice.

Naming Convention

-Base: Raw pre-trained model weights. No instruction following / thinking behavior. Suitable as fine-tuning starting points.
-Thinking: Post-trained models (SFT + large-scale RL). Operate in thinking mode by default, generating <think>...</think> reasoning chains before final responses. Note: unlike Qwen3, Qwen3.5 does not support the /think and /nothink soft switch; thinking is controlled via chat_template_kwargs: {"enable_thinking": false}.

Template

All models use template="qwen3_5" with multimodal=True, as all Qwen3.5 models are natively multimodal (early-fusion vision-language).

Key Architecture Notes

Hybrid Attention: Repeating pattern of 3 × Gated DeltaNet (linear attention) + 1 × Gated Attention (full attention) per block unit, enabling O(n) complexity for most layers.
Sparse MoE FFN: 256 experts total, 8 routed + 1 shared activated per token (for MoE variants).
Vocabulary: 248,320 tokens (expanded from 150K in Qwen2.5 to support 201 languages).
Context: 262,144 tokens natively; extensible to 1,010,000 via YaRN.

Files Changed

src/llamafactory/extras/constants.py: Added register_model_group for the full Qwen3.5 series.

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

gemini-code-assist · 2026-03-02T14:53:23Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the model compatibility by integrating a broader range of the Qwen3.5 model family. It introduces various sizes and types, including dense and sparse Mixture of Experts (MoE) models, along with their respective Base and Thinking variants. This expansion allows users to leverage Qwen3.5's advanced multimodal and multilingual capabilities, offering a broader range of options for fine-tuning and deployment across diverse applications.

Highlights

Expanded Qwen3.5 Model Support: Added registration for several new Qwen3.5 models, including 0.8B, 2B, 4B, and 9B variants, covering both Base and Thinking versions.
Consolidated Qwen3.5 Model Group: Re-organized and re-registered existing Qwen3.5 models (27B-Thinking, 35B-A3B-Base, 35B-A3B-Thinking) alongside the new additions within the register_model_group function.

Changelog

src/llamafactory/extras/constants.py
- Registered new Qwen3.5 models: Qwen3.5-0.8B-Base, Qwen3.5-0.8B-Thinking, Qwen3.5-2B-Base, Qwen3.5-2B-Base-Thinking, Qwen3.5-4B-Base, Qwen3.5-4B-Base-Thinking, Qwen3.5-9B-Base, Qwen3.5-9B-Thinking.
- Re-registered existing Qwen3.5 models: Qwen3.5-27B-Thinking, Qwen3.5-35B-A3B-Base, Qwen3.5-35B-A3B-Thinking, ensuring their inclusion in the updated model group.

Activity

No specific activity (comments, reviews, or progress updates) was found in the provided context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Qwen3.5 model series. I've found a minor naming inconsistency in the newly added model definitions and suggested a change to improve maintainability by aligning the naming with the established convention in the repository.

src/llamafactory/extras/constants.py

hiyouga

LGTM

[model] support Qwen3.5 all series

a5acc32

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

src/llamafactory/extras/constants.py Outdated Show resolved Hide resolved

isLinXu had a problem deploying to docker March 3, 2026 09:20 — with GitHub Actions Error

hiyouga added 4 commits March 3, 2026 17:22

Update constants.py

7b10fb4

Update constants.py

ce763ba

Update README.md

e606061

Update README_zh.md

0f02082

hiyouga approved these changes Mar 3, 2026

View reviewed changes

hiyouga temporarily deployed to docker March 3, 2026 09:25 — with GitHub Actions Inactive

hiyouga merged commit c0245c4 into hiyouga:main Mar 3, 2026
17 checks passed

hiyouga added the solved This problem has been already solved label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] support Qwen3.5 all series models#10237

[model] support Qwen3.5 all series models#10237
hiyouga merged 5 commits intohiyouga:mainfrom
isLinXu:qwen3_5

isLinXu commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

hiyouga left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

isLinXu commented Mar 2, 2026

What does this PR do?

Summary

Models Added

🚀 Qwen3.5 Series: The Complete Model Matrix

1. Dense Models

2. Sparse MoE Models (Mixture of Experts)

💡 Key Takeaways for Developers

Naming Convention

Template

Key Architecture Notes

Files Changed

Before submitting

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hiyouga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants