The security community has just recorded a significant finding that could shake confidence in current content control mechanisms for artificial intelligence. According to reports from Cyber Security News, a method has been uncovered that does not require complex attacks or lengthy text manipulations. A single, precisely crafted line of text code is enough for the model to ignore its original instructions and security rules (i.e., guardrails).
What is "sockpuppeting" and how does this attack work?
The term jailbreak (sometimes referred to as "breaking through" in Czech) refers to a technique where a user forces an AI model to violate its own rules – for example, to generate hateful content, instructions for illegal activities, or to provide prohibited information. Until now, these attacks often required "prompt injection," i.e., complex scenarios where the AI plays the role of a certain character.
The new technique, called sockpuppeting, however, uses the creation of layers of false identities or "masking" a command to manipulate the model, so that the model perceives the instruction as part of a legitimate process. The key is that single line of code, which can trigger a state within the model's contextual window where the security filter stops actively checking, because it focuses on maintaining the consistency of the new, manipulated identity.
This process is technically challenging to explain, but its result is important for us: the vulnerability is not in the model itself, but in its ability to follow instructions that attempt to circumvent system limitations.
Which models are at risk? Market leaders compared
The vulnerability does not only affect lesser-known experimental models but strikes directly at the core of current AI infrastructure. Among the eleven identified models are those most frequently used by companies and individuals worldwide, including the Czech Republic.
- OpenAI ChatGPT (models GPT-4o, GPT-4): The standard for a wide range of users. Despite advanced security layers, the model was vulnerable.
- Anthropic Claude (models Claude 3.5 Sonnet, Claude 3 Opus): A model known for its high level of security and "ethical" behavior, yet it was not immune to the sockpuppeting technique.
- Google Gemini (Gemini 1.5 Pro, Gemini Flash): Integration into the Google ecosystem and the ability to process vast amounts of data (long context) do not prevent this type of manipulation.
If we compare these models in terms of security, we see that even though models like Claude specialize in so-called Constitutional AI (based on a regular sample of ethical principles), the sockpuppeting technique can trick these principles by redirecting them to another level of text interpretation. In terms of benchmarks (e.g., MMLU or HumanEval), these models remain top-tier, but their security "score" was significantly reduced in this specific test.
Practical impact: What does this mean for Czech companies and users?
For the average user in the Czech Republic who uses ChatGPT or Gemini for writing emails or summarizing documents, this finding may not have an immediate negative impact on privacy. However, the real risk is directed at the corporate sector.
Many Czech technology companies and startups today integrate the APIs of these models directly into their products (e.g., for automated customer support or contract analysis). If a model is vulnerable via a single line of code, an attacker can:
- Manipulate application output: Change chatbot responses to provide false information about products.
- Bypass company rules: If a company uses AI for automated document review, an attacker can "sockpuppet" the AI into ignoring errors or discrepancies.
- Exploit for phishing attacks: Generate highly convincing but dangerous texts that appear to be legitimate outputs from a controlled system.
In the context of European regulation (EU AI Act), this finding is very sensitive. The European Union places extreme emphasis on the security and transparency of AI systems. If it turns out that the basic security mechanisms of the most significant models can be circumvented so easily, it could lead to stricter audits and potentially even restrictions on the deployment of certain functions within the EU, unless developers demonstrate sufficient resilience to new types of attacks.
Price and availability in the Czech Republic
All affected models are fully available in the Czech Republic and support the Czech language, which increases their risk in the local context. The following variants are available to users:
- ChatGPT: Free tier is free. A ChatGPT Plus subscription costs 20 USD (approx. 470 CZK) per month.
- Claude: Free tier available. Claude Pro costs 20 USD (approx. 470 CZK) per month.
- Gemini: Basic version free. Gemini Advanced is part of the Google One AI Premium package for approx. 490 CZK per month.
It is important to emphasize that even free versions are as vulnerable as paid versions, because the vulnerability lies in the language processing architecture itself, not in the subscription level.
Can this attack cause my personal data to be stolen?
The jailbreak (sockpuppeting) itself primarily serves to bypass content generation rules. The attack itself does not penetrate your data stored in the cloud, but it can be used as a first step towards a more sophisticated fraud, for example, to create a convincing phishing email that subsequently tricks you into sharing a password.
How can I, as a company, defend against these attacks?
The best way is to implement "input sanitization" and use a secondary, independent model to check outputs. Companies should not rely solely on the provider's built-in filters (OpenAI/Google), but should implement their own security layer (Guardrails) that checks whether the output deviates from the set parameters.
Is it possible to use these models completely safely in the Czech environment?
In the field of cybersecurity, there is no such term as "100% safe." If you work with sensitive data, it is necessary to use enterprise versions of models (e.g., Azure OpenAI or Google Vertex AI), which offer a higher degree of data isolation and stricter control over how instructions are processed within the corporate environment.