vault backup: 2026-03-08 00:26:11
This commit is contained in:
@@ -1,6 +1,14 @@
|
||||
**Qwen3.5** is a family of open-source multimodal large language models developed by the Qwen team at Alibaba Cloud. First released on February 16, 2026, with the initial model Qwen3.5-397B-A17B, it was followed by the Qwen3.5 Medium Model series on February 25, 2026. This series includes open-source models under the Apache 2.0 license—Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B (plus a base variant)—as well as the proprietary Qwen3.5-Flash available via API.[^1][^2] The series features a hybrid architecture combining Gated Delta Networks for linear attention with a sparse Mixture-of-Experts (MoE) design, enabling efficient inference where only a fraction of parameters are activated per forward pass—for instance, 17 billion active out of 397 billion total in the flagship model.[^3] Qwen3.5 incorporates native multimodal fusion through early text-vision integration trained on trillions of multimodal tokens, supporting advanced capabilities in reasoning, coding, visual understanding, and agentic tasks across 201 languages and dialects.[^1][^4]
|
||||
**Qwen3.5** is a family of open-source multimodal large language models developed by the Qwen team at Alibaba Cloud. First released on February 16, 2026, with the initial model Qwen3.5-397B-A17B, it was followed by the Qwen3.5 Medium Model series on February 25, 2026. This series includes open-source models under the Apache 2.0 license—Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B (plus a base variant)—as well as the proprietary Qwen3.5-Flash available via API.[^1][^2]
|
||||
|
||||
The models achieve high-throughput inference with minimal latency and cost overhead, thanks to their efficient hybrid architecture, near-lossless quantization for local deployment, and next-generation training infrastructure that attains near-100% multimodal training efficiency compared to text-only baselines.[^4] They demonstrate substantial improvements over prior Qwen series models, including Qwen3 and Qwen3-VL, across diverse benchmarks—such as outperforming in reasoning (e.g., MMLU-Pro, GPQA Diamond), coding (e.g., SWE-bench Verified, LiveCodeBench), visual tasks (e.g., MMMU, MathVision), and agentic evaluations (e.g., BFCL-V4, TAU2-Bench)—often beating models like GPT-5-mini and Claude Sonnet 4.5 in key third-party benchmarks while maintaining robust real-world adaptability through scalable reinforcement learning across million-agent environments.[^2][^3] Qwen3.5 supports context lengths up to 1 million tokens or more in various configurations and includes built-in tool-calling and agentic functionalities, making it suitable for deployment in diverse applications ranging from enterprise workflows to inclusive global AI systems.[^4]
|
||||
The series features a hybrid architecture combining Gated Delta Networks for linear attention with a sparse Mixture-of-Experts (MoE) design, enabling efficient inference where only a fraction of parameters are activated per forward pass—for instance, 17 billion active out of 397 billion total in the flagship model.[^3]
|
||||
|
||||
Qwen3.5 incorporates native multimodal fusion through early text-vision integration trained on trillions of multimodal tokens, supporting advanced capabilities in reasoning, coding, visual understanding, and agentic tasks across 201 languages and dialects.[^1][^4]
|
||||
|
||||
The models achieve high-throughput inference with minimal latency and cost overhead, thanks to their efficient hybrid architecture, near-lossless quantization for local deployment, and next-generation training infrastructure that attains near-100% multimodal training efficiency compared to text-only baselines.[^4]
|
||||
|
||||
They demonstrate substantial improvements over prior Qwen series models, including Qwen3 and Qwen3-VL, across diverse benchmarks—such as outperforming in reasoning (e.g., MMLU-Pro, GPQA Diamond), coding (e.g., SWE-bench Verified, LiveCodeBench), visual tasks (e.g., MMMU, MathVision), and agentic evaluations (e.g., BFCL-V4, TAU2-Bench)—often beating models like GPT-5-mini and Claude Sonnet 4.5 in key third-party benchmarks while maintaining robust real-world adaptability through scalable reinforcement learning across million-agent environments.[^2][^3]
|
||||
|
||||
Qwen3.5 supports context lengths up to 1 million tokens or more in various configurations and includes built-in tool-calling and agentic functionalities, making it suitable for deployment in diverse applications ranging from enterprise workflows to inclusive global AI systems.[^4]
|
||||
|
||||
Overview
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user