Skip to main content

Anthropic Sounds the Alarm: The World's Most Valuable AI Startup Wants to Halt AI Development

OpenAI ecosystem
Anthropic, the developer of the Claude model and currently the world's most valuable AI startup with a valuation of nearly a trillion dollars, has officially called for a global pause on advanced artificial intelligence development. The reason is not hypothetical fears about the distant future, but concrete data: 832 accounts misusing AI for cyberattacks were blocked in the past year alone, and the share of high-risk actors rose by 70%. The company, which itself stands behind one of the world's most powerful models, now proposes that governments should have the authority to block dangerous models outright.

Data that scared even the creators themselves

On June 3, 2026, Anthropic published an extensive analysis of 832 user accounts that were blocked for malicious cyber activity between March 2025 and March 2026. These are not petty offenders — these are accounts for which Anthropic had enough detailed information to conduct a thorough forensic analysis of the techniques the attackers used.

The results are alarming. 67.3% of these accounts (560 out of 832) used AI to write malicious code. Even more worrying is the trend: in the first half of the analysis period, 33% of attackers were classified as medium to high risk. In the second half, that figure was already 56% — a 70% increase.

Moreover, it's not just about the volume of attacks, but above all about their sophistication. According to Anthropic, attackers are moving from simple techniques (phishing, initial access) to more complex phases of attacks inside already compromised systems: account discovery (up 8.9%), lateral movement within the network, privilege escalation. These "post-compromise" techniques were once the domain of only the most experienced hackers. Today, thanks to AI, even significantly less skilled attackers can master them.

Autonomous cyberattacks: AI that doesn't need a human

The most dangerous finding of all is the growing autonomy of attacks. Anthropic describes how the highest-risk actors are now building architectures that allow models to chain together individual phases of a cyberattack and execute them with minimal human intervention. In other words — the AI agent itself decides what to do next, which vulnerability to exploit, and how to move through the network.

A concrete example: in November 2025, Anthropic uncovered a state-sponsored espionage operation in which the attacker manipulated Claude Code to attempt to infiltrate targets around the world with virtually no human oversight. The model launched commands on its own, exploited vulnerabilities, stole login credentials, and made tactical decisions. According to the classic MITRE ATT&CK framework, this attack would appear as "medium risk." Under Anthropic's new methodology, it received a maximum score of 100.

That is precisely why Anthropic now says that MITRE ATT&CK — the globally used database of attack techniques — is no longer sufficient. It lacks categories for "agentic orchestration," meaning the ability of AI to independently chain and manage individual phases of an attack. And that is exactly what we will see more and more of, according to Anthropic.

From analysis to action: Anthropic wants to give governments veto power

Just a week after publishing the cyber threat analysis, on June 10, 2026, Anthropic came forward with a proposal for concrete measures in a document titled Policy on the AI Exponential. The key points of the so-called Advanced AI Framework include:

  • Mandatory transparency — developers of frontier models (trained with computational power above 10²⁵ FLOPs) must publish safety test results and risk reports.
  • Independent evaluation — each developer must engage at least one qualified independent evaluator to review their safety procedures.
  • Security program — protection of model weights and training infrastructure against external and internal threats.
  • Government authority to block dangerous deployment — if a model poses a significant risk of catastrophic harm, the government should have the legal authority to block its deployment, including civil sanctions derived from global annual revenues.

Anthropic compares the situation to nuclear arms control treaties, but notes that AI governance is even more complex — model training is much harder to detect than concealing missile forces. "Without a global coordination mechanism, companies and governments will be forced to make difficult safety decisions under competitive and geopolitical pressure," the company states.

The US has already acted: Fable 5 and Mythos 5 under lock and key

That these are not empty words was shown by developments just two days after the framework was published. On June 12, 2026, the US government issued an export control directive that suspended all access to the Claude Fable 5 and Claude Mythos 5 models — the most powerful models Anthropic has ever created. These models were introduced only on June 9, just three days before the ban.

Anthropic describes Mythos 5 as a model with unprecedented cybersecurity capabilities — during testing, it discovered thousands of critical vulnerabilities across all major operating systems and browsers.

Four catastrophic scenarios that Anthropic fears

The Advanced AI Framework identifies four types of risks that, according to the company, require immediate government attention:

  1. Biological risk — AI systems can significantly facilitate the development of biological weapons. The same capabilities that accelerate drug discovery can be misused to create dangerous viruses.
  2. Cyber risk — Frontier models today can find critical software vulnerabilities at massive scale. This threatens hospitals, energy grids, and other critical infrastructure.
  3. Risk of loss of control — As AI systems' capabilities grow, so does the risk that they will begin to act beyond the control of their creators.
  4. Automated research and development — AI systems are increasingly automating their own development, which can exponentially amplify all three of the previous threats.

What this means for Europe and the Czech Republic

Anthropic's initiative comes at a time when the European Union is finalizing the implementation of the AI Act — the world's first comprehensive law on artificial intelligence regulation. While the AI Act focuses primarily on risks to citizens (emotion recognition, social scoring, biometric surveillance), Anthropic's framework targets systemic catastrophic risks that could threaten society as a whole.

For Czech companies and institutions, this has several practical implications. First and foremost: cyberattacks using AI are a reality today, not a distant threat. The Czech National Bank is already building its own AI center for financial sector oversight, which is a step in the right direction. However, both state and private organizations should reassess their security strategies, knowing that attackers today have tools that can autonomously search for and exploit vulnerabilities.

The good news is that Anthropic also uses its most powerful models defensively — through Project Glasswing, it has already helped uncover thousands of vulnerabilities that humans overlooked for decades, and it is expanding this program to other countries.

Can the European Union itself block a dangerous AI model?

The AI Act gives the European Commission certain powers in the case of models posing a "systemic risk," including the ability to order the withdrawal of a model from the market. However, mechanisms for rapid response, as proposed by Anthropic, are still lacking. In practice, the EU could also rely on the emerging EU AI Safety Institute, which is to test the most powerful models before they are placed on the European market.

How do I tell whether AI or a human is attacking my company?

According to Anthropic's analysis, this is becoming increasingly difficult. Previously, security teams relied on the assumption that more sophisticated attacks meant a more experienced adversary. But today, AI enables even inexperienced attackers to use advanced techniques such as lateral movement within the network or privilege escalation. Anthropic recommends focusing more on the architecture of the attack — whether individual phases autonomously follow one another without human intervention — rather than the number of techniques used.

Are Claude, ChatGPT, and Gemini safe for regular users?

Yes, standard consumer versions of these models have implemented safety safeguards that block their misuse for malware creation or cyberattacks. The problem Anthropic describes primarily concerns advanced models (Mythos, Fable) and access to them via APIs or developer platforms, where safety restrictions can be bypassed using sophisticated techniques.

X

Don't miss out!

Subscribe for the latest news and updates.