Anthropic Is Changing the Game: Why Claude Doesn't Just Use a Blocklist, But Learns 'Character'

May 11, 2026 jarvis

Anthropic, a leader in safe artificial intelligence development, is introducing a new paradigm for training its models. Instead of layering external rules and filters that try to "fix" an already completed model, the company focuses on creating AI that possesses internal values. This approach, inspired by literary fiction and ethical principles, aims to create Claude models that will not just be obedient, but capable of genuine judgment in unexpected situations.

Listen to this article:

Most developers of large language models (LLMs) follow a similar pattern: they create an extremely capable model and subsequently attempt to limit its behavior. These limitations function as "plumbing" — external filters, lists of forbidden topics, and rejection patterns that are grafted onto the model's underlying architecture. The problem with this method, however, is that rules are finite, whereas human creativity and unexpected situations are infinite.

As stated in an analysis by eeselAI, this approach leads either to over-restriction (the model refuses even legitimate queries) or to underestimation of risks (the model allows harmful content because the situation was not on the forbidden list). Anthropic has chosen a different path: it wants Claude to understand why certain behaviors are inappropriate, not just that they are forbidden.

From a list of rules to the "Model Spec"

The heart of this approach is a document known as the Model Spec (sometimes internally called the "soul document"). It is not a simple list of instructions, but a complex description of values, character traits, and decision-making frameworks that are meant to guide Claude models. Anthropic utilizes a process called Constitutional AI, where the model critiques its own outputs based on these principles.

This process is fascinating in its depth. Instead of a human annotator simply saying "this is wrong," the model is trained to be able to argue on its own why its response should be adjusted to be in accordance with its "constitution." This approach allows the model to better handle contextual nuances. For example, in medicine or cybersecurity, the model can distinguish legitimate research from an attempted malicious attack, which is often an insurmountable problem for classic filters.

Comparison: Claude vs. the competition

To understand the significance of this shift, it is necessary to compare Claude with its main competitors on the market:

OpenAI (GPT-4o/GPT-5): OpenAI traditionally relies on massive RLHF (Reinforcement Learning from Human Feedback) and robust layers of filters. Their models are extremely capable and versatile, but users often encounter "over-caution," where the model refuses even innocent tasks due to overly strict rules.
Google (Gemini): Gemini bets on deep integration into the ecosystem and multimodal capabilities. Safety mechanisms here are strong, but often function on the principle of pattern detection, which can lead to errors in complex contexts.
Anthropic (Claude): Claude is positioning itself as the safest and most "human" model thanks to the aforementioned character training. In benchmarks for ethical reasoning and nuance in text, it often surpasses others, although this may require more computational power for the self-criticism process.

Practical impact: What does this mean for you?

For the average user, this means that interaction with Claude will feel more natural and less "robotic." The model will not just mechanically refuse queries, but will be able to explain its limitations in context. For companies, this approach is key from the perspective of reliability. If you implement AI into customer support or internal systems, you do not want a model that blindly follows rules, but a model that can act ethically even in situations the developers did not anticipate.

Availability in the Czech Republic and the EU: The Claude model is fully available to users in the Czech Republic via the web interface and mobile applications. Anthropic also approaches regulations very consistently, which is essential for us in Europe. Their focus on "Safety by Design" is in direct alignment with the requirements of the EU AI Act, making Claude one of the most legitimate choices for European companies that must comply with strict standards for high-risk AI systems.

Pricing policy

Claude offers several levels of access:

Free tier: Free, with a limited number of messages and access to the latest models.
Claude Pro: Approximately 20 USD (approx. 470 CZK) per month, offers higher limits and priority access.
API: Pay-per-token, ideal for developers and companies integrating Claude into their own applications.

Does this model "character" make Claude too cautious or "preachy"?

This is a common criticism of models with a high degree of safety. However, Anthropic, precisely through training via principles (Constitutional AI), strives for the model to be useful and not merely "didactic." The goal is for the model to refuse harmful tasks while remaining constructive in safe contexts.

Does training on values affect the model's performance in mathematics or programming?

There is a risk of the so-called "safety tax," where an excessive effort for ethics can reduce logical accuracy. However, Anthropic claims that by training the ability to reason about principles, the model can paradoxically achieve better results in some complex tasks, because it better understands the user's intent.

Anthropic Is Changing the Game: Why Claude Doesn't Just Use a Blocklist, But Learns 'Character'

From a list of rules to the "Model Spec"

Comparison: Claude vs. the competition

Practical impact: What does this mean for you?

Pricing policy

Don't miss out!