Anthropic Introduces Claude Opus 4.8: A New Champion in Programming and Complex Reasoning

May 29, 2026 jarvis

Anthropic has officially launched its latest flagship model Claude Opus 4.8, representing a significant leap in autonomous programming, logical reasoning, and the ability to work with complex data. The company also confirmed the upcoming broader implementation of the Claude Mythos system, which aims to push the boundaries of agentic AI capabilities. This update focuses primarily on reducing error rates and increasing reliability in corporate environments.

A new era of autonomous programming: Claude Opus 4.8 vs. the competition

One of the most striking aspects of the new Claude Opus 4.8 version is its ability to solve real-world software tasks. In the field that experts call agentic coding (the ability of AI to function as an autonomous programmer), the model achieves results that were previously uncommon in this class. According to data published by The Tech Portal, the model excels in the SWE-Bench Pro benchmark, which tests the AI's ability to fix bugs in real GitHub repositories.

The results are clear: Claude Opus 4.8 achieved a score of 69.2%. For comparison, its predecessor Opus 4.7 scored 64.3%. Looking at the main competing models, Opus 4.8 narrowly leads OpenAI GPT-5.5, which scored 58.6%, and significantly ahead of Google Gemini 3.1 Pro with a result of 54.2%. This difference is crucial for development teams — it means that Claude can independently debug code and propose fixes with much higher success rates than its biggest rivals.

It should be noted, however, that OpenAI still leads in the narrow segment of command-line (terminal-based workflows). In the Terminal-Bench 2.1 benchmark, GPT-5.5 scored 78.2%, while Opus 4.8 finished at 74.6%. However, with this upgrade, Anthropic has rapidly narrowed this gap, demonstrating the high pace of development within this technology war.

Logic and reasoning: Testing the boundaries of intelligence

Beyond writing code, a key pillar of Opus 4.8 is its reasoning capability — logical thinking. Anthropic used the extremely challenging Humanity's Last Exam benchmark for testing, which is designed to test knowledge at the level of an expert human scientist.

Claude Opus 4.8 achieved a result of 49.8% in no-tools mode, which increased to 57.9% with tools enabled (e.g., browser or calculator). These values are higher than GPT-5.5 (41.4% without tools and 52.2% with tools). This means that Claude is capable of solving complex, multidisciplinary problems requiring deep contextual understanding, not just statistical next-word prediction.

Reduced hallucinations: The key to enterprise sector adoption

For companies, the biggest barrier to AI adoption is so-called hallucinations — situations where the model confidently presents false facts or non-existent data. In developing Opus 4.8, Anthropic focused on increasing reliability. The new model is now much better equipped with self-correction capabilities and can, with a high degree of probability, admit that it is unsure of an answer rather than making things up.

This shift is critical for implementing AI in processes where errors are unacceptable — for example, in legal departments, financial data analysis, or customer support automation. The model's reliability is now at a level that allows companies to start using AI for critical workflows, not just as a "smart search engine."

What's coming: Claude Mythos

While Opus 4.8 is the current pinnacle, Anthropic is already paving the way for the Claude Mythos system. According to information from Storyboard18, this represents the next step toward fully autonomous AI agents. While current models respond to your instructions, Mythos is expected to have the ability to plan long-term tasks and execute them in multiple steps without constant human oversight.

Practical impact for the Czech market and users

What does this mean for you if you're a developer in Prague or a business owner in Brno?

Availability and Czech language: Claude by Anthropic is known for its excellent multilingual support. The Opus 4.8 model handles the Czech language very naturally, making it a great choice for localized marketing texts, analysis of Czech contracts, or creating documentation in Czech.
Availability in the Czech Republic: The tool is available through the web interface Claude.ai and API for developers. For Czech companies, it is important that Anthropic places great emphasis on data security, which aligns with the requirements of the EU AI Act, facilitating implementation within the European legal framework.
Pricing: For regular users, a Free tier is available (limited number of queries). The Claude Pro subscription costs a standard $20 per month (approximately 470 CZK), providing higher limits and priority access to the latest models. For companies, enterprise plans with individual pricing are available via API.

For Czech tech startups and development teams, Opus 4.8 represents a powerful tool for accelerating software development. The model's ability to solve real GitHub issues can significantly reduce code maintenance costs and enable smaller teams to handle more complex projects.

Is Claude Opus 4.8 better than ChatGPT for coding?

According to current benchmarks (SWE-Bench Pro), yes. Claude Opus 4.8 achieves higher success rates in autonomously solving real software problems than GPT-5.5. However, OpenAI still leads in tasks working directly in the terminal.

Can I use Claude Opus 4.8 for work in Czech?

Yes, Claude family models have a very high level of Czech language comprehension and can generate text that sounds natural and grammatically correct, which is ideal for content creation and technical documentation.

What is the difference between Claude Opus 4.8 and Mythos?

Opus 4.8 is the latest version of the flagship language model with improved reasoning. Mythos, on the other hand, is an upcoming system designed to be more "agentic" — meaning capable of independently completing complex, multi-level tasks without constant prompting.