Qwen3.6: Open Model from China Codes Better Than Claude Opus 4.7 — and You Can Run It on a Home Computer

April 19, 2026 jarvis

AI article illustration for ai-jarvis.eu

An open AI model from China that codes better than Claude Opus 4.7 — and you can run it on your home computer. On April 15, 2026, Alibaba released Qwen3.6-35B-A3B under the Apache 2.0 license, and the developer community can't stop staring: the model achieved 73.4% on SWE-bench Verified, thus outclassing Gemma 4 and approaching top proprietary models — all with just three billion active parameters.

The Magic of Architecture: 35 Billion Parameters, You Only Pay for Three

The designation 35B-A3B describes a key trick: the model has a total of 35 billion parameters, but in each inference step, it activates only approximately 3 billion of them. This is a sparse Mixture of Experts (MoE) architecture — the model consists of hundreds of specialized "expert" sub-networks and selects only the relevant ones for each token.

The result is remarkable: computational costs correspond to a model with three billion parameters, but the capacity of learned knowledge is 35 billion. A review on BuildFastWithAI describes this as an efficiency ratio of 12:1 — for the price of a small model, you get the performance of a large one.

Practically, this means you can run Qwen3.6-35B-A3B on a MacBook with 64 GB RAM or on an RTX 4090 graphics card with 32 GB VRAM. On an RTX 4090, via the optimized Unsloth GGUF model, it achieves speeds of over 120 tokens per second — a comfortable conversational pace for everyday use and demanding agentic tasks.

Benchmarks: Where Qwen3.6 Shone and Where It Still Has Room

The results on SWE-bench Verified — the most important metric for the ability to autonomously fix real bugs on GitHub — are impressive:

Qwen3.6-35B-A3B: 73.4%
Gemma 4-31B (Google): 52.0%
Claude Sonnet 4.5 (Anthropic): comparable results on pure coding benchmarks

On MCPMark — a benchmark measuring the ability to use tools (tool use), which is the foundation of agentic systems — Qwen3.6 achieved 37.0%, while Gemma 4-31B only 18.1%. More than double. For developers building AI agents that call APIs, search databases, or control a browser, this is a key indicator.

On Terminal-Bench 2.0, which tests the ability to control a terminal and execute commands, the model achieved a score of 51.5. Independent researcher Simon Willison, who tested the model locally, wrote that Qwen3.6 on some creative tasks even surpassed Claude Opus 4.7.

Thinking like Claude or o1: thinking mode and its context preservation

Like the latest models from Anthropic or OpenAI, Qwen3.6 supports two operating modes:

Thinking mode — the model first thinks through the problem step-by-step (Chain-of-Thought), then responds. Suitable for mathematics, coding, and complex agentic tasks.
Non-thinking mode — a direct, quick answer without visible reasoning. Suitable for conversation, summarization, or simple queries.

Switching is done simply by a parameter in the API — no need to download a different model. A unique feature is Thinking Preservation: the model remembers its thought context even between conversational turns. In multi-step agentic workflows, where one agent calls another and results are passed on, this is a great advantage over models that discard their thoughts after each response.

Context of 262 Thousand Tokens — and Over a Million with Extension

Qwen3.6-35B-A3B natively works with a context of 262,144 tokens — which corresponds to approximately 200 thousand words or the entire codebase of a medium-sized project in a single prompt. Using the YaRN scaling technique, the context can be extended to over a million tokens.

For comparison: GPT-4o handles 128,000 tokens, and even the latest Claude Opus 4.7 works with a 200,000 token window in its basic configuration. Qwen3.6 thus surpasses even the proprietary top in this regard.

Run in Five Minutes: Ollama, vLLM, or GGUF

Alibaba released the model on Hugging Face under the Apache 2.0 license — meaning free commercial use without fees. There are three deployment methods:

Ollama: the simplest way — just run the command ollama run qwen3.6 and the model works locally
vLLM: production deployment with queue management, tool use, and an OpenAI-compatible API
Unsloth GGUF: quantized versions optimized for Apple Silicon and single-GPU machines

A cloud API via Alibaba Cloud Bailian is in preparation. For Czech and Slovak companies looking for a powerful model without dependence on American clouds or without the need to share data with a third party, this is an interesting alternative — the model can be operated completely on-premise.

Czech Language: What Qwen3.6 Can and Cannot Do

The Qwen3 series (on which Qwen3.6 is built) declares support for over 100 languages, including Czech. In practice, Czech support in Chinese models is usually weaker than in models trained primarily on English data — Qwen3.6 is no exception. For technical tasks in English (coding, code analysis, agentic pipelines), the model excels. For writing Czech texts or understanding Czech documentation, the performance is good, but not at the level of Claude or GPT-4o.

Czech developers will appreciate the model primarily as a cheap local coding engine for CI/CD pipelines, code review agents, or automation — deployment on their own hardware without cloud costs and GDPR risks associated with sending code to third parties.

Alibaba vs. OpenAI, Anthropic, and Google: The Battle for Open Source

The release of Qwen3.6 comes at a time when open models are reaching a performance level that, just a year ago, belonged exclusively to proprietary systems. Alibaba has succeeded in what Meta is also striving for with Llama: to show that open source AI is not a compromise.

While Google's Gemma 4 focuses on multimodality and integration into the Google ecosystem, Qwen3.6 bet on performance in agentic coding and tool use — areas where developers feel the most pain. The result? On SWE-bench, Qwen3.6 surpasses Gemma by 21 percentage points. That's not a small difference.

Proprietary models are currently dominated by Claude Sonnet 4.5 and GPT-4o, but Qwen3.6 is catching up and surpasses them on some specific benchmarks — and it does so for free, locally, without token fees.

Can I use Qwen3.6-35B-A3B commercially for free?

Yes. The model is released under the Apache 2.0 license, which permits commercial use without fees and without the need to disclose source code. The only condition is to maintain attribution of the model's origin in the documentation.

What hardware do I need to run Qwen3.6 locally?

Minimum requirements are 64 GB RAM (e.g., MacBook Pro M3 Max) or an NVIDIA RTX 4090 graphics card with 32 GB VRAM. On an RTX 4090 via Unsloth GGUF, the model generates over 120 tokens per second. For production deployment, multiple GPUs are recommended for parallel processing of requests.

How well does Qwen3.6 handle Czech?

The model declares support for over 100 languages, including Czech. It is strongest in technical English-language tasks — coding, analysis, and agentic workflows. For Czech texts and documents, it performs well, but top models like Claude or GPT-4o still lead in the linguistic nuances of Czech.