Skip to main content

Meta released Llama 4: Record context 10 million tokens, multimodal capabilities — but Czech companies cannot use it

AI article illustration for ai-jarvis.eu
Meta has released its most ambitious open-source models in history — Llama 4 Scout and Llama 4 Maverick. Scout surpasses everything to date with a 10 million token context window; both models see and hear the world through text and images. The catch? The European Union — including the Czech Republic — is explicitly excluded from using the models for companies and developers. Meta refers to ongoing legal disputes regarding training data in the license. For the Czech AI scene, this is a cold shower amidst otherwise hot news.

Llama 4: Two Models, One Architecture, Record Numbers

On April 5, 2026, Meta released two new Llama 4 series models on its blog and the Hugging Face platform: Scout and Maverick. Both are built on the Mixture of Experts (MoE) architecture — the model has 17 billion active parameters, but the total number of parameters is significantly higher because only a portion of the so-called experts are activated in each pass.

Scout has 16 experts and a total of 109 billion parameters, while Maverick has 128 experts and approximately 400 billion parameters. Despite this, both models have similar computational demands during inference — a key advantage of MoE architectures, which is also gaining traction at Google (Gemini) and France's Mistral AI.

10 Million Tokens: A Context Window Without Competition

The biggest technical surprise is the Scout model's context window: 10 million tokens. For comparison — the competing Gemini 3.1 Flash Lite offers 1 million tokens, Claude Opus 4.6 similarly. Scout leaps past the previous maximum by an order of magnitude.

In practice, this means that entire books, extensive codebases, or hundreds of documents can be inserted into a prompt at once without the model losing context. Meta used a combination of special techniques for this: NoPE (No Positional Encoding) every four layers, Chunked Attention with 8 thousand token blocks, and modified softmax scaling to prevent the model from losing attention on very long sequences.

Maverick offers "only" 1 million tokens — but even that is enough for working with extensive enterprise documents or the source code of an entire project.

Natively Multimodal: Text and Images from the Ground Up

Llama 4 Scout and Maverick are natively multimodal — they were not created by adding a visual module to a text model, but were trained on both text and images from the start. Both models accept text and image inputs; the output is currently text-based.

The training data comprises 40 trillion tokens in 200 languages. Czech is among these languages, however, Meta does not specify what proportion of the material is Czech-language. Experience with previous generations of Llama models suggests that Czech is supported, but not at the level of English.

Benchmarks: Where Llama 4 Stands

The results speak in favor of Maverick, which ranks in the top league of open-weight models based on benchmark results:

  • MMLU Pro: Maverick 80.5%, Scout 74.3%
  • GPQA Diamond (scientific reasoning): Maverick 69.8%
  • LMArena ELO score: Maverick 1,417 — comparable to GPT-4o, below the latest GPT-5.4 or Claude Opus 4.6

Gemini 3.1 Pro leads on GPQA Diamond with 94.3%, Claude Opus 4.6 dominates on SWE-bench Verified (software engineering) with 80.8%. Llama 4 is thus not the best on the market, but for the price of an open-weight model without licensing fees, it offers exceptional performance.

Price and Availability: A Good Deal — For Those Who Are Allowed

The models are free to download on Hugging Face and llama.com. Scout can be run on a single server GPU thanks to 4- or 8-bit quantization — for technically proficient companies, this is an attractive option to have their own powerful model without monthly payments.

Via third-party APIs, Maverick costs approximately 0.15–0.27 dollars per million tokens — which is 3 to 5 times less than GPT-5.4 Mini (0.75 USD/M tokens). Scout is even cheaper.

But here comes a crucial caveat.

EU Excluded: Czech Developers Are Not Allowed to Play

The Llama 4 Community License Agreement contains an explicit clause: "This license does not apply to entities domiciled or registered in a Member State of the European Union." This includes the Czech Republic, Slovakia, and the entire EU.

The prohibition applies to operators and developers — companies and individuals in the EU may not deploy, host, or build products based on the models. End-users who access Llama 4 via Meta AI or other third-party services outside the EU are not restricted by the license.

Why did Meta introduce this restriction? According to analysis by The Decoder and Computerworld, it's a combination of two factors: ongoing lawsuits concerning copyright for training data and concerns about the GDPR regulatory environment. Recent court documents revealed that Meta was aware of the presence of copyrighted content in its training datasets. The European market is stricter in this regard than the American one — and Meta preferred to play it safe.

The situation is not entirely new: a similar restriction already applied to some multimodal variants of Llama 3. Llama 4 extends it to the entire series.

What This Means for the Czech Scene

For Czech AI studios, developers, and companies who followed the release of Llama 4 with enthusiasm, this is a real obstacle. Deploying the model in one's own infrastructure — on a server in Prague, in a Czech cloud, or on a local computer — violates the license and exposes the operator to legal risk.

Alternatives exist: models like Gemma 4 from Google (Apache 2.0, without geographical restrictions) or Mistral Large 3 (available via Mistral La Plateforme with EU data residency) offer performance comparable to Scout without legal complications. For more demanding tasks, Maverick's performance level remains equivalent to GPT-4o, but this level can also be achieved through other means.

Meta has promised that it intends to open-source future versions of Llama 4 models — without a specific timeline. Whether these versions will also be available in the EU is not yet clear.

Llama 4 Behemoth: A Large Model on the Horizon

Scout and Maverick are not the end of the story. Meta has announced a third model in the series — Llama 4 Behemoth — with no specific release date yet. This is a much larger model, whose parameters Meta has not disclosed, but which should serve as a "teacher" for distilling future smaller models. Maverick, after all, has already been distilled from Behemoth.

As a Czech developer, can I try Llama 4 Scout locally without violating the license?

Formally no — the Llama 4 Community License Agreement prohibits entities domiciled in the EU from operating or hosting the models, regardless of where the server physically runs. The only exceptions are end-users accessing the model via an approved third-party service. For local experiments, we recommend Gemma 4 (Apache 2.0 license) or Mistral models.

What is the difference between Llama 4 Scout and Maverick — who is each for?

Scout is optimized for easy deployability (a single GPU) and an extremely long context (10 million tokens) — it is suitable for working with extensive documents, legal texts, or codebases. Maverick offers higher benchmark performance due to more experts and is suitable for more demanding reasoning tasks, but requires more computational power.

Why didn't Meta create a special license that meets GDPR requirements to include the EU?

The problem is not just GDPR, but especially ongoing lawsuits in both the US and Europe regarding copyright for training data. Adapting the license for the EU would expose Meta to further legal risks in a jurisdiction where penalties for copyright infringement are stricter. It is therefore more of a strategic legal decision than a technical inability.