Skip to content

[model] support Qwen3.5 all series models#10237

Merged
hiyouga merged 5 commits intohiyouga:mainfrom
isLinXu:qwen3_5
Mar 3, 2026
Merged

[model] support Qwen3.5 all series models#10237
hiyouga merged 5 commits intohiyouga:mainfrom
isLinXu:qwen3_5

Conversation

@isLinXu
Copy link
Contributor

@isLinXu isLinXu commented Mar 2, 2026

What does this PR do?

Add support for the full Qwen3.5 model series (Base and Thinking variants) in src/llamafactory/extras/constants.py.

Fixes # (issue)

image 0 archon-qwen3_5_moe (1)

Summary

This PR registers the complete Qwen3.5 model family, which was released by the Qwen Team on 2026-02-15. Qwen3.5 is a native multimodal model series built on a hybrid architecture (Gated DeltaNet linear attention + Gated Attention + Sparse MoE), supporting 262K context natively and 201 languages.

Models Added

🚀 Qwen3.5 Series: The Complete Model Matrix

1. Dense Models

The Dense series provides consistent performance across general-purpose tasks and is the foundation for standard fine-tuning.

Model Name Parameters Type Positioning & Use Case Base Released Thinking Model (HF Path)
Qwen3.5-0.8B 0.9B Dense Ultra-lightweight: Edge devices & mobile SDKs. Qwen/Qwen3.5-0.8B
Qwen3.5-2B 2.1B Dense Lightweight: Mobile apps, IoT, & low-latency chat. Qwen/Qwen3.5-2B
Qwen3.5-4B 5.2B Dense Multimodal Ready: Ideal for lightweight Agents. Qwen/Qwen3.5-4B
Qwen3.5-9B 10.2B Dense Efficiency King: Best ROI for developers/local use. Qwen/Qwen3.5-9B
Qwen3.5-27B 28B Dense The Solid Performer: Balanced power for production. Qwen/Qwen3.5-27B

2. Sparse MoE Models (Mixture of Experts)

These models utilize a "High Capacity, Low Activation" architecture, offering flagship-level intelligence with significantly faster inference speeds.

Model Name Total / Active Type Positioning & Use Case Base Released Thinking Model (HF Path)
Qwen3.5-35B-A3B 36B / 3B MoE Pocket Rocket: 3B-tier speed with 36B-tier logic. Qwen/Qwen3.5-35B-A3B
Qwen3.5-122B-A10B 125B / 10B MoE Mid-tier Pro: The workhorse for high-end Agents. Qwen/Qwen3.5-122B-A10B
Qwen3.5-397B-A17B 403B / 17B MoE Flagship Behemoth: SOTA reasoning & long context. Qwen/Qwen3.5-397B-A17B

💡 Key Takeaways for Developers

  • The "Thinking" Evolution: Unlike previous versions, the post-trained models are branded as Thinking versions. These are specifically optimized for chain-of-thought reasoning, mathematics, and complex coding tasks.
  • Base vs. Thinking Availability:
    • Models like 9B and 35B-A3B are "Full-stack," meaning both Base and Thinking versions are available for deep fine-tuning.
    • Large-scale models (27B, 122B, 397B) are currently only available in Thinking versions, making them perfect for out-of-the-box deployment or instruction-based fine-tuning.
  • Deployment Strategy:
    • If you have limited VRAM but need high intelligence, the 35B-A3B (MoE) is your best bet as it only "feels" like a 3B model during inference.
    • For RAG (Retrieval-Augmented Generation) applications, the 9B Dense model remains the most cost-effective choice.

Naming Convention

  • -Base: Raw pre-trained model weights. No instruction following / thinking behavior. Suitable as fine-tuning starting points.
  • -Thinking: Post-trained models (SFT + large-scale RL). Operate in thinking mode by default, generating <think>...</think> reasoning chains before final responses. Note: unlike Qwen3, Qwen3.5 does not support the /think and /nothink soft switch; thinking is controlled via chat_template_kwargs: {"enable_thinking": false}.

Template

All models use template="qwen3_5" with multimodal=True, as all Qwen3.5 models are natively multimodal (early-fusion vision-language).

Key Architecture Notes

  • Hybrid Attention: Repeating pattern of 3 × Gated DeltaNet (linear attention) + 1 × Gated Attention (full attention) per block unit, enabling O(n) complexity for most layers.
  • Sparse MoE FFN: 256 experts total, 8 routed + 1 shared activated per token (for MoE variants).
  • Vocabulary: 248,320 tokens (expanded from 150K in Qwen2.5 to support 201 languages).
  • Context: 262,144 tokens natively; extensible to 1,010,000 via YaRN.

Files Changed

  • src/llamafactory/extras/constants.py: Added register_model_group for the full Qwen3.5 series.

Before submitting

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the model compatibility by integrating a broader range of the Qwen3.5 model family. It introduces various sizes and types, including dense and sparse Mixture of Experts (MoE) models, along with their respective Base and Thinking variants. This expansion allows users to leverage Qwen3.5's advanced multimodal and multilingual capabilities, offering a broader range of options for fine-tuning and deployment across diverse applications.

Highlights

  • Expanded Qwen3.5 Model Support: Added registration for several new Qwen3.5 models, including 0.8B, 2B, 4B, and 9B variants, covering both Base and Thinking versions.
  • Consolidated Qwen3.5 Model Group: Re-organized and re-registered existing Qwen3.5 models (27B-Thinking, 35B-A3B-Base, 35B-A3B-Thinking) alongside the new additions within the register_model_group function.
Changelog
  • src/llamafactory/extras/constants.py
    • Registered new Qwen3.5 models: Qwen3.5-0.8B-Base, Qwen3.5-0.8B-Thinking, Qwen3.5-2B-Base, Qwen3.5-2B-Base-Thinking, Qwen3.5-4B-Base, Qwen3.5-4B-Base-Thinking, Qwen3.5-9B-Base, Qwen3.5-9B-Thinking.
    • Re-registered existing Qwen3.5 models: Qwen3.5-27B-Thinking, Qwen3.5-35B-A3B-Base, Qwen3.5-35B-A3B-Thinking, ensuring their inclusion in the updated model group.
Activity
  • No specific activity (comments, reviews, or progress updates) was found in the provided context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Qwen3.5 model series. I've found a minor naming inconsistency in the newly added model definitions and suggested a change to improve maintainability by aligning the naming with the established convention in the repository.

Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga hiyouga merged commit c0245c4 into hiyouga:main Mar 3, 2026
17 checks passed
@hiyouga hiyouga added the solved This problem has been already solved label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

solved This problem has been already solved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants