Claude Opus 4.8 is here: Anthropic released a model that beats GPT-5.5 and for the first time can say "I'm not sure about that"

May 29, 2026 Daniel Cesak

Anthropic AI data center TPU compute infrastructure

Anthropic has released Claude Opus 4.8 — a new flagship model that the company itself describes as a "modest but tangible improvement." It beats OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro in most benchmarks. But the most significant change isn't just in the numbers — the new Opus has learned to admit its own uncertainty. And in the age of agentic AI, that's more important than it might seem at first glance.

Listen to this article:

Benchmarks: Opus 4.8 at the top

Anthropic published a complete system card with dozens of comparisons. In agentic coding on the SWE-Bench Pro test, Opus 4.8 achieved 69.2% — a jump from 64.3% for the previous Opus 4.7 and significantly ahead of 58.6% for GPT-5.5.

In multidisciplinary reasoning on the Humanity's Last Exam test, which contains expert questions across disciplines, Opus 4.8 scored 49.8% without tools and 57.9% with tools. Both results are currently the highest of all available models.

On the practical GDPval-AA benchmark, which tests real-world knowledge tasks, Opus 4.8 achieved 1 890 points at maximum effort level — 137 points more than Opus 4.7 and 121 points ahead of GPT-5.5. In head-to-head matchups with GPT-5.5, it wins roughly 67% of cases.

Honesty as a new metric

One of the most discussed features of the new model is improved honesty. Large language models tend to hallucinate and confidently assert things they've actually made up. According to Anthropic, Opus 4.8 is roughly four times less likely to let a code error slip through without comment.

"Early testers report that Opus 4.8 more often flags uncertainties in its work and less often asserts unsubstantiated conclusions," Anthropic writes in the official announcement. This is confirmed by independent testers — for example, investment analyst Michael Ran from Blackstone stated that "the biggest difference was Opus 4.8's tendency to proactively flag issues with analysis inputs and outputs that other models routinely overlooked."

For professional deployment — whether in law, finance, or healthcare — this is a fundamental shift. A model that admits when it's unsure is safer than a model that invents things.

Dynamic workflows: Hundreds of agents in a single session

Alongside the model itself, Anthropic introduced a dynamic workflows feature, which may be more important for developers than the model upgrade itself. Claude can now plan a task and then launch hundreds of parallel sub-agents in a single session.

In Claude Code, this means Opus 4.8 can handle full-codebase migrations across hundreds of thousands of lines — from initial planning through to merge. "Claude Code with Opus 4.8 now performs end-to-end codebase migration, using the existing test suite as the quality bar," Anthropic explains. The feature is available in Claude Code on Enterprise, Team, and Max plans.

Effort control: You decide how hard Claude thinks

A new feature on claude.ai and in the Cowork desktop app is the effort switcher next to the model selector. Users can choose from four levels:

Low — quick responses, lower token consumption
High (default) — best quality-to-speed ratio
Extra (labeled xhigh in Claude Code) — for difficult tasks
Max — maximum reasoning depth, highest token consumption

Anthropic simultaneously increased limits in Claude Code so that higher effort levels don't block work. In practice, this means you can "let" the model think longer on a complex research or development task, while saving time and limits on quick queries.

Prices stay the same — real costs are dropping

API prices remain unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. More interesting, however, is efficiency — according to Artificial Analysis, Opus 4.8 needs 15% fewer passes per task and 35% fewer output tokens than Opus 4.7 on the GDPval-AA benchmark.

Fast Mode, which runs at 2.5x speed, now costs $10 per million input tokens and $50 per million output tokens — a third of what it cost with previous models.

Availability in the Czech Republic and for Czech users

Claude Opus 4.8 has been available everywhere since May 28, 2026 — via claude.ai, Claude Code, Cowork, and through the API with the identifier claude-opus-4-8. Claude supports Czech — you can communicate with it in Czech and the model will respond in Czech, albeit with occasional minor inaccuracies typical for non-generative languages. For Czech developers and companies, this is a full-fledged alternative to GPT-5.5, especially for tasks requiring diligence and transparency.

For European companies, it's important that Anthropic offers regional compliance and models run on European infrastructure as well (for example, via Google Cloud Vertex AI in EU regions).

What's next: Mythos on the horizon

Anthropic also confirmed that Claude Mythos class models — significantly more powerful than Opus — should reach customers "in the coming weeks." Currently, Mythos Preview is available only to a limited number of organizations under Project Glasswing for cybersecurity purposes. Anthropic is working on safety measures that will enable broader distribution.

Is Claude Opus 4.8 better than GPT-5.5?

In most benchmarks, yes — Opus 4.8 beats GPT-5.5 in agentic coding (SWE-Bench Pro: 69.2% vs 58.6%), in multidisciplinary reasoning (Humanity's Last Exam), and in practical knowledge tasks (GDPval-AA). In head-to-head matchups, it wins roughly 67% of cases. GPT-5.5 remains stronger in some specific domains, particularly where OpenAI has invested in specialized training.

How much does using Claude Opus 4.8 via API cost?

Standard pricing is $5 per million input tokens and $25 per million output tokens — the same as Opus 4.7. Fast Mode costs $10/$50 per million tokens. In practice, however, Opus 4.8 may be cheaper than its predecessor, as analyses show it requires 15% fewer passes and 35% fewer output tokens for the same tasks.

Does Claude Opus 4.8 support Czech?

Yes, Claude understands Czech and can respond in Czech. The model is trained on multilingual data including Czech and Slovak. For professional use in Czech, however, it's advisable to verify the factual accuracy of outputs, particularly for specialized topics where the model may draw primarily from English-language sources.