Skip to main content

DeepSeek V4 Is Here: Open-Source 1.6-Trillion-Parameter Model Costs a Tenth of Claude's Price

AI article illustration for ai-jarvis.eu
Chinese AI lab DeepSeek on April 24, 2026 unveiled its strongest model to date. DeepSeek V4 arrives in two variants — the flagship Pro and the affordable Flash — with a million-token context, an open-source MIT license, and API prices that undercut the competition. How is it different from GPT-5.4 or Claude Opus, and why should it interest Czech developers?

Two models, one goal: performance at a fraction of the price

DeepSeek V4 does not arrive as a single model, but as a family. V4-Pro is the flagship with 1.6 trillion total parameters, activating 49 billion per token. It is the largest open-source model of today. V4-Flash is its more economical sibling: 284 billion parameters total, 13 billion active. Both share a context window of 1 million tokens and are based on the Mixture of Experts (MoE) architecture, which processes only a fraction of weights at once.

For Czech readers, it is key that both models are released under the MIT license. This means free commercial and non-commercial use, the ability to self-host and modify. For companies in the EU dealing with GDPR compliance and data sovereignty, this is an important advantage over closed models from OpenAI or Anthropic.

An architecture that saves memory and money

The main innovation of V4 is not just size, but efficiency. DeepSeek replaced the standard attention mechanism with a so-called Hybrid Attention Architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). In practice, this means that with a million-token context, V4-Pro needs only 27% of compute operations and 10% of KV cache memory compared to the previous V3.2.

This is not a marginal improvement. A million tokens corresponds to roughly two novels — or an entire source code repository. Until now, such long contexts were economically demanding to deploy. Thanks to compression and selective attention, DeepSeek makes them available as a standard feature.

Besides that, the team used the Muon optimizer instead of the common AdamW and introduced so-called Manifold-Constrained Hyper-Connections (mHC) for training stability at trillion-scale. The model underwent pre-training on 33 trillion tokens.

Benchmarks: best in code, slight gap in knowledge

DeepSeek published a detailed comparison with the best closed models. In programming, V4-Pro achieves top results:

  • LiveCodeBench: 93.5% — the highest score of all models (Claude Opus 4.6 has 88.8%, Gemini 3.1 Pro 91.7%)
  • Codeforces rating: 3206 — above GPT-5.4 (3168) and Gemini (3052)
  • SWE-bench Verified: 80.6% — nearly identical to Claude Opus 4.6 (80.8%)

In agentic tasks, where the model works independently in a terminal, V4-Pro scored 67.9% on Terminal-Bench 2.0 and surpassed Claude (65.4%). This is important for developers of autonomous agents.

Where it loses ground, on the other hand, is general knowledge and expert multi-domain reasoning. On Humanity's Last Exam (HLE) it reached 37.7%, while Claude scored 40.0% and Gemini 3.1 Pro 44.4%. On the SimpleQA-Verified test, which checks the accuracy of factual answers, Gemini has a significant lead. For purely programming use, however, this is essentially irrelevant.

A price list that changes the rules of the game

The biggest impact may come from the price. DeepSeek repeats the recipe from last year's R1: top performance at a fraction of the competition's cost.

ModelInput / 1M tokensOutput / 1M tokens
DeepSeek V4-Flash0.14 USD0.28 USD
DeepSeek V4-Pro1.74 USD3.48 USD
Claude Opus 4.615 USD75 USD
GPT-5.4~15 USD~60 USD

V4-Pro is approximately 21× cheaper on output tokens than Claude Opus 4.6 at nearly identical performance in programming. V4-Flash costs less than one cent per thousand output tokens. Converted at a rate of 23 CZK/USD, one million output tokens for V4-Pro comes to roughly 80 CZK, for V4-Flash to just under 6.50 CZK.

DeepSeek also offers discounted prices for cache hit — repeated system prompts — for V4-Pro at just 0.145 USD per million input tokens. For agent pipelines with recurring tasks, this further reduces costs.

Availability and what it means for Czech users

The models are available through three paths: the web interface chat.deepseek.com (Pro as Expert Mode, Flash as Instant Mode), API compatible with both the OpenAI and Anthropic formats, and as open weights on Hugging Face.

For Czech developers and companies, the most interesting option is self-deployment thanks to the MIT license. V4-Flash is 160 GB, which after quantization is accessible even for more advanced local hardware. V4-Pro at 865 GB, on the other hand, is intended for cloud clusters.

On the other hand, it is necessary to mention the risk: DeepSeek API servers run in China. For companies processing personal data or sensitive information, this arrangement may conflict with internal security policies or regulations. Self-hosting within the EU is therefore the only reasonable path for sensitive applications.

Interestingly, DeepSeek V4 was trained on Huawei Ascend 950PR chips, not on the latest NVIDIA hardware, according to the Reuters agency. This confirms that top models can be developed even outside the ecosystem of the American chip manufacturer.

What V4 still cannot do

The model is still in preview. DeepSeek admits that on the hardest tasks it remains behind closed models by roughly 3–6 months. It also lacks multimodal input — it cannot process images, audio, or video. For projects where analysis of screenshots or documents with charts is required, one must look elsewhere.

The older models deepseek-chat and deepseek-reasoner will be shut down on July 24, 2026, so migration to V4 is therefore unavoidable for existing users.

Is DeepSeek V4 available in Czech?

The official web interface is in English and Chinese. The API, however, supports the Czech language as input and output at a level similar to other top models. For Czech companies, the biggest advantage is the ability to self-host weights under the MIT license within the EU.

What is the difference between V4-Pro and V4-Flash?

V4-Pro has 1.6 trillion parameters (49 billion active) and costs 3.48 USD per million output tokens. V4-Flash has 284 billion parameters (13 billion active) and costs 0.28 USD per million output tokens. Flash is ideal for high-frequency tasks, Pro for the most complex programming and agent workflows.

Can I run DeepSeek V4 locally in Europe?

Yes, both models are under the MIT license and their weights are freely available for download. V4-Flash (160 GB) is more accessible for local deployment after quantization. V4-Pro requires powerful server clusters. Local operation in the EU addresses concerns about data transfer to China.

X

Don't miss out!

Subscribe for the latest news and updates.