Skip to main content

Claude Mythos found 10,000 vulnerabilities in a month. Opus 4.8 now beats GPT-5.5 even in honesty

Ilustrační obrázek
In a single month, Anthropic changed the rules of the game in AI. Its model Claude Mythos Preview uncovered over 10,000 critical vulnerabilities in the world's most widely used software — from Linux to Firefox to libraries powering billions of devices. Shortly after, the company announced the model Claude Opus 4.8, which beats GPT-5.5 from OpenAI in benchmarks and is the first to be able to admit its own uncertainty. And as if that weren't enough — Anthropic became the world's most valuable AI startup with a valuation of 965 billion dollars. What does this mean for cybersecurity and why should Czech companies care?

Listen to this article:

Mythos as a cyber detective: 10,000 bugs in a month

When Anthropic introduced the Claude Mythos Preview model in April 2026, the expert community took notice. The model was not specifically trained for cybersecurity — its capabilities emerged spontaneously as a byproduct of advanced code understanding, logical reasoning, and autonomous tool execution. The result? In the first month of the Project Glasswing program, in which roughly 50 partner organizations participated, Mythos identified over 10,000 vulnerabilities of high or critical severity. Cloudflare alone discovered 2,000 bugs in its own systems thanks to the model, of which 400 were critical. Mozilla used it to uncover and fix 271 vulnerabilities in Firefox 150 — ten times more than with the previous version, Claude Opus 4.6. When scanning more than 1,000 open-source projects, the model found an estimated 6,202 critical vulnerabilities, with independent security firms confirming 90.6% of them as valid. One specific example demonstrates the model's power in practice: Mythos identified a vulnerability, CVE-2026-5194, in the wolfSSL library, which billions of devices worldwide rely on. The model itself assembled a functional exploit that would allow an attacker to create forged certificates — and thus fake banking websites indistinguishable from real ones. The bug was fortunately fixed before anyone could exploit it.

The open-source community cannot keep up with patching

Mythos's success, however, has a downside. Vulnerability discovery is no longer the bottleneck in cybersecurity — patching itself has now become the new bottleneck. Several open-source project maintainers have asked Anthropic to slow the pace of disclosure. Fixing a single critical bug takes an average of two weeks — and the gap between a confirmed finding and a deployed fix is widening every week. Daniel Stenberg, founder and lead developer of cURL, told The Register that even improved AI reports represent an enormous burden: "This risk adds more work for countless open-source maintainers who are already struggling to keep up." Anthropic responded by partnering with the Open Source Security Foundation and contributing 4 million dollars to help triage and process reports.

Claude Opus 4.8: The model that beats GPT-5.5 and knows how to say "I don't know"

While Mythos remains in limited access, Anthropic released Claude Opus 4.8 on May 28, 2026 — a new flagship that outperforms not only its predecessor in benchmarks but also GPT-5.5 from OpenAI and Gemini 3.1 Pro from Google. The company itself describes the model as "a modest but tangible improvement", but the numbers speak clearly. In agentic coding (SWE-Bench Pro) it scores 69.2% — compared to 64.3% for Opus 4.7 and 58.6% for GPT-5.5. In the Humanity's Last Exam test, which measures multi-domain reasoning, it scored 49.8% without tools and 57.9% with tools — the highest score in the field. On the GDPval-AA benchmark for real-world knowledge work, it requires 35% fewer output tokens than the previous version, meaning lower operating costs for developers.

Honesty as a new metric

The most interesting innovation in Opus 4.8, however, is not the dry numbers — it is improved honesty. As The Decoder writes, the model is four times less likely to overlook an error in its own code without commenting. In other words — when it is not sure, it says so. In the context of cybersecurity, this is a crucial trait: a security analyst needs to know when they can trust the model's output and when they cannot. Anthropic also introduced dynamic workflows — the model can plan a task and then launch hundreds of parallel sub-agents in a single session. When migrating code across hundreds of thousands of lines, this means significant time savings. The new effort control allows developers to set how much computational power the model should dedicate to a task — ranging from quick mode to "max" for the hardest problems.

Anthropic vs. OpenAI: Clash of giants

While OpenAI released GPT-5.5 in April 2026, Anthropic is now countering with two models — Opus 4.8 for regular users and Mythos Preview for the security elite. The market situation is unprecedented: Anthropic has surpassed OpenAI in market valuation (965 billion dollars after the latest funding round of 65 billion dollars), making it the world's most valuable AI startup. OpenAI last secured a valuation of 852 billion dollars in March 2026. When it comes to cybersecurity, the difference between the models is stark. The UK AI Security Institute (AISI) confirmed that Mythos Preview is the first model to independently solve both simulated cyber attacks in their test environment. GPT-5.5 lags behind in this area — OpenAI does not focus as much on the cybersecurity capabilities of its models.

API pricing: Opus 4.8 holds the line

API pricing for Opus 4.8 remains the same as the previous version: 5 dollars per million input tokens and 25 dollars per million output tokens. More interesting is the price cut for Fast Mode — it now costs one-third of what it did before, specifically 10 dollars per million input tokens and 50 per million output tokens. For comparison: GPT-5.5 from OpenAI charges approximately 3 dollars per million input tokens and 15 per million output tokens, so the price advantage is still on OpenAI's side — but Anthropic argues for higher output quality.

What does this mean for the Czech Republic and Europe?

Claude is available in Czech and communicates in Czech at a very good level, including specialized technical terminology. For Czech developers and companies, this means they can use Anthropic's models for security audits of their own code without needing expensive external consultants. Claude Security — a repository scanning tool — has been in open beta since May 22 for Claude Enterprise corporate customers. From a European perspective, the situation is more complex. The EU AI Act classifies models with advanced cyber capabilities as high-risk, which may complicate their deployment in regulated sectors — such as banking or critical infrastructure. It is no coincidence that Bank of England Governor Andrew Bailey, who chairs the Financial Stability Board (FSB) bringing together G20 country regulators, requested a briefing from Anthropic regarding vulnerabilities in the financial system. Cybersecurity professor Florian Tramèr of ETH Zurich warned ETH News that Anthropic's approach is strongly pro-American: "Models of this type are potentially relevant for national security, intelligence services, and the military. If access remains restricted to American entities, those entities will have a significant head start." For Europe — and the Czech Republic as an EU member state — this means we should not fall behind in building our own AI security capabilities.

What comes next?

Anthropic announced that Mythos-class models will be made available to all customers in the coming weeks — once sufficiently strong safety guardrails are finalized. At the same time, the company relaxed confidentiality conditions for Project Glasswing partners: they can now share discovered vulnerabilities with other companies, regulators, and government agencies. For the average user, the takeaway is simple: update your software. Regularly. Without exception. With models capable of uncovering thousands of vulnerabilities per month, patching is the only defense between you and a potential attacker — whether they use AI or not.

Is Claude Opus 4.8 available for free even for regular users in the Czech Republic?

Yes, Claude Opus 4.8 is available through the web interface claude.ai even in the free version, albeit with a limited number of queries. For unlimited access, a Claude Pro subscription (20 USD per month) or Team/Max plans are required. The model handles Czech at a very good level, so communication in your native language is not a problem.

How does Claude Opus 4.8 differ from GPT-5.5 in practical use?

In programming and security analysis, Opus 4.8 has the edge according to benchmarks — particularly due to its ability to admit uncertainty and a lower rate of overlooking its own errors. GPT-5.5, on the other hand, is cheaper on API calls (roughly 40% less) and has a broader ecosystem of third-party tools. For code security auditing, the Claude ecosystem is currently the more compelling choice.

Can Claude Mythos actually hack a system without human intervention?

According to current information, no. The model can find a vulnerability and assemble a proof-of-concept exploit, but between finding a bug and a full-fledged attack there are still a series of complex steps — bypassing protective mechanisms, privilege escalation, hiding activity — that require human expertise. The risk lies more in that the model multiplies the capabilities of less experienced attackers.

X

Don't miss out!

Subscribe for the latest news and updates.