Skip to main content

Sakana AI Introduces Fugu: An Orchestrator That Matches the Performance of the World's Best Models

Ilustrační obrázek
Tokyo-based Sakana AI has just unveiled Fugu – a system that changes the rules of the game in the field of large language model (LLM) utilization. Instead of striving to create one largest model, Sakana has opted for a different approach: to create an intelligent orchestrator. Fugu can dynamically coordinate various specialized models to provide the user with results on par with today's best models, such as Anthropic Fable or Mythos, all through a single, simple API access.

The end of the era of monolithic models? How orchestration works

Most of today's AI systems operate on the "monolith" principle – you have one huge model (e.g., GPT-4 or Claude) that tries to be an expert in everything. Sakana AI, however, comes with a vision of collective intelligence. Fugu is not just another chatbot; it is a higher-level language model that has been trained to manage a "team" of other models.

Imagine it as orchestral conducting. When a user asks a complex question, Fugu doesn't start answering immediately. Instead, it analyzes the task, selects the most suitable agents (other LLMs) from the available "pool" for the given segment (e.g., one for coding, another for logical reasoning, a third for fact-checking), delegates parts of the work to them, and then synthesizes the results into one coherent answer. For the end-user or developer, it looks as if they are communicating with a single, incredibly smart model.

The system is divided into two versions:

  • Fugu: Focuses on low latency and speed. Ideal for common chatbots, quick code reviews, and daily assistance.
  • Fugu Ultra: Developed for maximum quality in solving complex, multi-level problems. This model is intended for scientific research, cybersecurity analysis, or in-depth patent searching.

Benchmarks: Fugu Ultra vs. the world's best

What Sakana AI is doing is not just a theoretical concept. According to the Fugu technical report, the Ultra version achieves results that match (and in some cases surpass) top models from Anthropic, such as Fable 5 or Mythos Preview. However, it is important to note that these models are not directly part of the Fugu pool because they are not publicly available, which means that with their integration, Fugu's performance could increase even further.

Here is a comparison of key parameters from available tests:

Benchmark Fugu Ultra GPT 5.5 Gemini 3.1 Pro Opus 4.8
SWE Bench Pro (Software Engineering) 73.7 58.6 54.2 69.2
LiveCodeBench (Programming) 93.2 85.3 88.5 87.8
Humanity's Last Exam (Complex Reasoning) 50.0 41.4 44.4 49.8

The results in programming are particularly impressive. Testers report that Fugu Ultra can detect many more errors during code review than standard models – while GPT-5.5 might find three errors, Fugu, thanks to its ability to delegate control to specialized agents, can identify over twenty.

Strategic Independence: Why Orchestration is Key for Europe and the Czech Republic

One of the most important aspects of Fugu is not just its performance, but also its resilience to geopolitical risks. Sakana AI correctly points out that relying on a single provider (e.g., OpenAI or Anthropic) poses a significant risk for both companies and states. As recent export controls and regulatory interventions against Anthropic models have shown, access to the best AI tools can be restricted overnight by government decisions or changes in foreign policy.

For the Czech market and European companies that must comply with the strict EU AI Act regulation, Fugu offers a crucial advantage: model agnosticism. If one provider were to become unavailable in the EU or its models failed to meet new security standards, developers using Fugu could simply swap out the "agents" in the pool for others that comply with local legislation, without having to rewrite their entire infrastructure. It is an insurance against so-called vendor lock-in (dependence on a single supplier).

Practical Use and Pricing

Fugu is already available through Sakana AI's official website and their console. It is very easy for developers to deploy because it uses a standard OpenAI-compatible API. This means that integration into existing applications is a matter of minutes.

Regarding costs, Sakana AI offers two billing models:

  • Subscription: For regular daily use within teams.
  • Pay-as-you-go (by usage): Ideal for larger workloads and companies that want to pay exactly for what they consume.

Note: Exact prices in Czech crowns are not currently set, but the system operates on a token basis, similar to GPT or Claude.

For the Czech user, it is important to mention that although Sakana AI is a Japanese startup, their systems are capable of working with various languages through underlying models. Even though the primary documentation is in English, the orchestrator's ability to utilize models with good Czech language capabilities (like GPT or Claude) means that Fugu will also work in our linguistic environment.

However, one thing to keep in mind is that orchestration increases the number of tokens called. This means that for complex tasks, the cost may be higher than for a single direct query to one model. The advantage, however, is the quality and reliability of the result.

Do I have to pay separately for each individual model that the Fugu system uses?

No, Fugu acts as a single intelligent interface. You pay for the orchestration and system capacity usage via the Sakana AI API, which includes the costs of calling underlying models within your pool.

Is Fugu safe for sensitive company data in the EU?

Yes, one of Fugu's main advantages is the ability to exclude specific models from the pool. Companies can set rules to ensure that sensitive data never reaches third-party models that do not meet their internal security or EU standards.

Can Fugu replace a human programmer?

Fugu is extremely powerful in tasks such as code review, bug finding, or automated research. However, it is intended more as a highly advanced tool to increase the productivity of professionals rather than a full-fledged replacement for a human.

X

Don't miss out!

Subscribe for the latest news and updates.