MiniMax M3: Chinese open-weight model beats GPT-5.5 and Gemini 3.1 Pro. At 5% of the cost

June 1, 2026 Daniel Cesak

AI article illustration for ai-jarvis.eu

  Chinese AI labs continue to assault the dominance of Western giants. Startup MiniMax today released the M3 model — the first open-weight system that combines a million-token context, native multimodality, and autonomous agent capabilities in a single package. And on key benchmarks, it surpasses GPT-5.5 and Gemini 3.1 Pro — at 5–10% of their cost.

What is MiniMax M3 and why you should care

Chinese startup MiniMax, which until now stood somewhat in the shadow of DeepSeek or Alibaba, today officially launched its most ambitious language model yet. It's called MiniMax M3 and it's the world's first open-weight model that simultaneously offers three capabilities previously reserved for closed commercial systems: a million-token context window, native multimodality (text, images, and video), and performance at the level of top-tier models in coding and agent tasks.

Unlike GPT-5.5 from OpenAI or Claude Opus from Anthropic, which operate exclusively as closed APIs, MiniMax promises to release the complete model weights on HuggingFace and GitHub within 10 days. For companies that need to run AI on their own infrastructure — whether for data security reasons or to maintain independence from an external provider — this is a significant development.

Benchmarks: Where M3 beats the giants (and where it doesn't)

Numbers from both official tests and independent measurements speak clearly. On SWE-Bench Pro, which measures a model's ability to independently solve real-world software tasks, M3 achieved a score of 59.0%. That's higher than GPT-5.5 (54.8% according to official measurements) and Gemini 3.1 Pro (47.1%). However, it still trails the current leader — Anthropic's Claude Opus 4.8, released last week, reaches 69.2%.

On Terminal-Bench 2.1, which evaluates model performance in the command line, M3 with a score of 66.0% is practically on par with the previous generation Opus 4.7 (66.1%), but again lags behind Opus 4.8 (74.6%). On BrowseComp — a test of autonomous web search and information processing — M3 scored 83.5%, surpassing Claude Opus 4.7 (79.3%).

An interesting comparison is with DeepSeek-V4 Pro Max, another Chinese open-weight model. M3 narrowly beats it on SWE-Bench Pro (59.0% vs 55.4%), slightly trails on Terminal-Bench (66.0% vs 67.9%), and the two effectively tie on BrowseComp (83.5% vs 83.4%).

In multimodal tests, M3 excels: on OmniDocBench it surpasses Gemini 3.1 Pro, and on SVG-Bench (vector graphics generation) it even beats Claude Opus 4.7. In Claw-Eval, a complex autonomous agent test, it achieved the highest score among all tested models.

MSA: How MiniMax tamed quadratic complexity

The key to M3's efficiency is a new attention architecture called MSA (MiniMax Sparse Attention). The classic Transformer mechanism suffers from computation that grows quadratically with input length — double the context and the computation quadruples. This is why models with long context are so expensive.

MSA solves this problem elegantly. Instead of the model "rereading the entire library" at each step, as classic full attention does, MSA works like an intelligent indexing system. It pre-divides data blocks and only looks at the relevant ones when queried. Each block is read exactly once and memory access is contiguous — meaning much better hardware utilization.

The result? At a context length of 1 million tokens, the per-token computational cost is just 1/20 compared to the previous generation MiniMax model. This translates into 9× speedup in the prefilling phase and 15× speedup in decoding. In internal tests, MSA ran more than 4× faster than alternative open-source solutions like Flash-Sparse-Attention.

Price: 5–10% of what OpenAI and Anthropic charge

But perhaps the most compelling argument is the pricing. During the introductory promotion (first week after launch), M3 costs $0.30 per million input tokens and $1.20 per million output tokens via API. Even after the promotion ends, the price will remain at $0.60/$2.40 per million tokens. For comparison:

GPT-5.5 (OpenAI): $5.00/$30.00 — i.e., 12–25× more expensive
Claude Opus 4.8 (Anthropic): $5.00/$25.00 — 10–21× more expensive
Gemini 3.1 Pro (Google): $2.00/$12.00 — 5–7× more expensive

MiniMax is also launching a Token Plan subscription in three tiers: Plus at $20/month (approximately 460 CZK), Max at $50/month (1,150 CZK), and Ultra at $120/month (2,760 CZK). When converted to tokens, this is one of the highest quotas on the market — Plus offers roughly 1.7 billion tokens per month. All prices are excluding VAT and are billed in US dollars; a standard payment gateway is available for Czech developers.

MiniMax Code: An agent that programs on its own

Along with the model, MiniMax is also launching its own coding environment MiniMax Code — a desktop and web application that turns M3 into an autonomous programming assistant. Its main weapon is Agent Team: a system that breaks large tasks into parallel workflows and deploys multiple agents simultaneously.

Particularly interesting is the Producer + Verifier mechanism. One agent generates code, the other tests it in real time and returns feedback. Thanks to this, the system can run autonomously for several days without human intervention, continuously fixing its own errors. MiniMax Code also supports computer use thanks to native multimodality — you can, for example, tell it via mobile: "Open the local ERP and upload invoices from this Excel spreadsheet," and the agent will execute it.

For developers who prefer their own tools, M3 is compatible with Claude Code, Cursor, Roo Code, and Cline via an API key (prefix sk-cp). It also supports a switchable "thinking mode" for complex tasks, or a fast mode for routine code completion.

12 hours of autonomous science and other feats

One of the most impressive tests MiniMax published was the independent reproduction of a scientific paper. M3 was tasked with reproducing the paper Learning Dynamics of LLM Finetuning, which received the Outstanding Paper award at the ICLR 2025 conference. Over nearly 12 hours of autonomous work, the model made 18 commits, generated 23 experimental graphs, and successfully replicated the key results — including the so-called "squeezing effect" in DPO experiments.

In an even more demanding test, M3 spent 24 hours optimizing a CUDA kernel for FP8 matrix multiplication on NVIDIA Hopper GPU architecture. After 147 attempts and nearly 2,000 tool calls, it increased hardware efficiency from 7.6% to 71.3% — a 9.4× speedup. What's remarkable: most other models (except Opus 4.7) gave up after the first 30 attempts. M3 persistently sought new paths even after hitting performance ceilings.

What it means for the Czech Republic and Europe

For Czech companies and developers, the key advantage is primarily the ability to run the model on their own infrastructure. Many European companies — from banks to healthcare facilities — are subject to strict data handling regulations. An open-weight model means that sensitive data never has to leave corporate servers.

The model is available via public API and supports a wide range of languages including Czech — although official Czech support is not explicitly declared, models trained on corpora of over 100 billion tokens typically cover dozens of languages, including Slavic ones. MiniMax Code is available for download on Windows, macOS, and Linux.

From the perspective of the EU AI Act, it's important that open models fall into a less stringent regulatory category — their deployment in European companies is administratively simpler than closed commercial APIs, which require more thorough due diligence.

Open weights: Why it matters

The decision to release the model weights is a strategic move that sets MiniMax apart from OpenAI, Anthropic, and Google. For enterprise customers, this means three critical advantages:

Data sovereignty — the model runs locally, no data leaves the corporate network
Full customizability — companies can fine-tune the model on their own data and modify its architecture
Cost certainty — the uncertainty of variable API fees disappears; operating costs are determined solely by electricity and hardware consumption

The question remains under which specific license the weights will be released — whether it will be a permissive MIT/Apache 2.0 license or a more restrictive model. This will have a fundamental impact on commercial use possibilities. MiniMax currently promises to release the technical documentation and weights "within the next 10 days."

MiniMax M3 in the context of the AI race

The release of M3 comes at a time when Chinese AI labs are systematically lowering the price barrier to enter the world of top-tier language models. DeepSeek set the trend with the V4 model, Xiaomi surprised with aggressive MiMo pricing, Alibaba counters with Qwen. MiniMax M3 adds a unique combination to this mosaic: frontier performance with open weights at a fraction of the cost.

For developers, this means one thing: the era when you had to pay hefty monthly bills to access cutting-edge AI is slowly ending. While M3 isn't the absolute benchmark winner — Anthropic's Claude Opus 4.8 remains a step ahead in purely coding and agent tasks — the price/performance ratio is so compelling that for most enterprise deployments, this will be the decisive parameter.

Initial reactions from the developer community are unequivocally positive. The creators of the Cline tool confirmed compatibility on day one and particularly praised the MSA architecture, which "slashes computational costs to 1/20 of the previous generation." Independent testers on X especially appreciate the model's ability to work autonomously for many hours without performance degradation.

Is MiniMax M3 available for free?

Yes and no. The API is paid — starting at $0.30 per million input tokens during the introductory promotion. The MiniMax Code subscription starts at $20/month. However, within 10 days MiniMax plans to release the open model weights, meaning you'll be able to download and run it on your own hardware for free — you only pay for electricity and compute power.

Does MiniMax M3 support Czech?

MiniMax does not officially declare explicit Czech language support, but M3 was trained on a corpus of over 100 billion tokens covering dozens of languages. Practical experience with similarly trained models shows that Czech should be functional — though not at the level of specialized multilingual models. The precise language capabilities in Czech will only become clear after the weights are released and tested by the community.

What hardware do I need to run M3 locally?

MiniMax has not yet published precise hardware requirements. Given that this is a model with a million-token context window and an efficient sparse-attention architecture, a powerful GPU server will be needed — estimated at least 4× NVIDIA A100 (80 GB) or equivalent for full performance. A single high-end GPU might suffice for basic inference; exact requirements will be known along with the technical documentation.