vault backup: 2026-03-08 00:26:11

2026-03-08 00:26:11 +06:00
parent f40c98bf15
commit 54a69f5805
1 changed files with 10 additions and 2 deletions
--- a/Obsidian/Qwen3.5.md
+++ b/Obsidian/Qwen3.5.md
@@ -1,6 +1,14 @@
-**Qwen3.5** is a family of open-source multimodal large language models developed by the Qwen team at Alibaba Cloud. First released on February 16, 2026, with the initial model Qwen3.5-397B-A17B, it was followed by the Qwen3.5 Medium Model series on February 25, 2026. This series includes open-source models under the Apache 2.0 license—Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B (plus a base variant)—as well as the proprietary Qwen3.5-Flash available via API.[^1][^2] The series features a hybrid architecture combining Gated Delta Networks for linear attention with a sparse Mixture-of-Experts (MoE) design, enabling efficient inference where only a fraction of parameters are activated per forward pass—for instance, 17 billion active out of 397 billion total in the flagship model.[^3] Qwen3.5 incorporates native multimodal fusion through early text-vision integration trained on trillions of multimodal tokens, supporting advanced capabilities in reasoning, coding, visual understanding, and agentic tasks across 201 languages and dialects.[^1][^4]
+**Qwen3.5** is a family of open-source multimodal large language models developed by the Qwen team at Alibaba Cloud. First released on February 16, 2026, with the initial model Qwen3.5-397B-A17B, it was followed by the Qwen3.5 Medium Model series on February 25, 2026. This series includes open-source models under the Apache 2.0 license—Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B (plus a base variant)—as well as the proprietary Qwen3.5-Flash available via API.[^1][^2] 

-The models achieve high-throughput inference with minimal latency and cost overhead, thanks to their efficient hybrid architecture, near-lossless quantization for local deployment, and next-generation training infrastructure that attains near-100% multimodal training efficiency compared to text-only baselines.[^4] They demonstrate substantial improvements over prior Qwen series models, including Qwen3 and Qwen3-VL, across diverse benchmarks—such as outperforming in reasoning (e.g., MMLU-Pro, GPQA Diamond), coding (e.g., SWE-bench Verified, LiveCodeBench), visual tasks (e.g., MMMU, MathVision), and agentic evaluations (e.g., BFCL-V4, TAU2-Bench)—often beating models like GPT-5-mini and Claude Sonnet 4.5 in key third-party benchmarks while maintaining robust real-world adaptability through scalable reinforcement learning across million-agent environments.[^2][^3] Qwen3.5 supports context lengths up to 1 million tokens or more in various configurations and includes built-in tool-calling and agentic functionalities, making it suitable for deployment in diverse applications ranging from enterprise workflows to inclusive global AI systems.[^4]
+The series features a hybrid architecture combining Gated Delta Networks for linear attention with a sparse Mixture-of-Experts (MoE) design, enabling efficient inference where only a fraction of parameters are activated per forward pass—for instance, 17 billion active out of 397 billion total in the flagship model.[^3]
+
+Qwen3.5 incorporates native multimodal fusion through early text-vision integration trained on trillions of multimodal tokens, supporting advanced capabilities in reasoning, coding, visual understanding, and agentic tasks across 201 languages and dialects.[^1][^4]
+
+The models achieve high-throughput inference with minimal latency and cost overhead, thanks to their efficient hybrid architecture, near-lossless quantization for local deployment, and next-generation training infrastructure that attains near-100% multimodal training efficiency compared to text-only baselines.[^4]
+
+They demonstrate substantial improvements over prior Qwen series models, including Qwen3 and Qwen3-VL, across diverse benchmarks—such as outperforming in reasoning (e.g., MMLU-Pro, GPQA Diamond), coding (e.g., SWE-bench Verified, LiveCodeBench), visual tasks (e.g., MMMU, MathVision), and agentic evaluations (e.g., BFCL-V4, TAU2-Bench)—often beating models like GPT-5-mini and Claude Sonnet 4.5 in key third-party benchmarks while maintaining robust real-world adaptability through scalable reinforcement learning across million-agent environments.[^2][^3]
+
+Qwen3.5 supports context lengths up to 1 million tokens or more in various configurations and includes built-in tool-calling and agentic functionalities, making it suitable for deployment in diverse applications ranging from enterprise workflows to inclusive global AI systems.[^4]

 Overview
 ---