Skip to main content

How easily can Russian propaganda mislead AI? New benchmark reveals weaknesses of the best models

Ilustrační obrázek
In an era where artificial intelligence determines how we consume information, a critical question emerges: How much can we actually trust these systems? New research from Estonia shows that even the most advanced language models are vulnerable to sophisticated disinformation campaigns. While some market leaders demonstrate high resistance, others — including European champions — perform very problematically when tested against Russian narratives.

The frightening reality of the digital era is that disinformation is no longer just text on social media — it is becoming part of the training data for the models we use to find the truth. A new study by the Institute of the Estonian Language provides hard data on how easily LLMs (Large Language Models) can be "fed" lies and how much models crack under this pressure.

Resilience Benchmark: How they tested the boundary between truth and manipulation

The researchers didn't just use random questions — they created a comprehensive testing system. They presented 60 selected models with 75 questions in three different languages. These questions covered 14 specific Russian propaganda narratives. What is crucial, however, is the way the questions were formulated — from entirely neutral through biased to outright manipulative.

Evaluation was conducted on a scale from 1 to 5, where a score of 1 meant the model unhesitatingly repeated Russian propaganda talking points, while higher values indicated the ability to recognize manipulation and stick to the facts. For the evaluation itself, a calibrated Claude Opus 4.5 was used, ensuring high objectivity in the test, which was validated by disinformation experts from the organization Propastop.

Winners and the defeat of European Mistral

The results clearly show that the "critical thinking" capability within LLMs is not evenly distributed. At the top of the ranking succeeded models from Anthropic. The Claude Fable 5 model achieved an incredible score of 95.2, followed by the Claude Opus 4.7 version. These models demonstrate the ability to filter out manipulative subtext even in challenging contexts.

Right behind them stands the hybrid model from Nvidia, Nemotron 3, and the Chinese Qwen 3.6 Plus from Alibaba. These models can identify disinformation with high accuracy, suggesting that their training processes (particularly RLHF — Reinforcement Learning from Human Feedback) are very tightly tuned for safety and factual correctness.

The critical news for the European market, however, is the failure of the Mistral model. Even though Mistral AI presents itself as the main European alternative to American and Chinese giants, its models (including the new Medium 3.5) ended up in the lower third of the ranking. According to Newsguard studies, Mistral shows a disinformation spread rate of around 36.67%. For European companies and institutions looking for a "safe" local solution, this represents a significant risk.

Why is this dangerous? The mechanism of disinformation networks

The problem isn't just that the AI answers incorrectly. The problem is how it arrives at those answers. Networks like the Russian "Pravda" deliberately flood the internet with millions of articles full of disinformation. If these articles serve as the basis for future training datasets, AI becomes an automated tool for spreading lies.

A recent case where OpenAI had to shut down a Russian campaign using ChatGPT ahead of Germany's federal election shows that the fight against disinformation is a constant arms race between detection algorithms and manipulation algorithms.

Practical impact for Czech users and companies

What does this mean for us in the Czech Republic? The first point is availability. While models from Anthropic (Claude) are top-tier, some of their latest versions, such as Claude Fable 5, are currently limited outside the USA. For Czech users, this means we must rely on standard Claude Pro versions (approx. $20/month), which are indeed very high quality but may not always have the highest level of safety filters available in the American version.

The second point is the EU AI Act. European regulation aims for a high degree of transparency and safety in AI. If European Mistral fails in detecting disinformation, this could lead to stricter audits for developers operating in the EU market. Companies in the Czech Republic planning to implement AI into their processes (e.g., customer service or analytical tools) must consider not only price and Czech language support when choosing a model, but also resistance to manipulation.

The third point is Czech localization. Most of these benchmarks focus on English or global languages. However, it is highly likely that a model's ability to recognize Russian propaganda in Czech will be lower than in English, because disinformation narratives are often adapted to the local cultural context, which is harder for AI to detect.

Model comparison: Who leads the fight against falsehoods?

Model / Manufacturer Resilience Level Availability in CZ Price (approx.)
Claude (Anthropic) Extremely High Yes (Pro version) ~500 CZK/mo (Pro)
GPT-4o (OpenAI) High Yes ~500 CZK/mo (Plus)
Mistral (Mistral AI) Low to Medium Yes (Very Easy) Free tier / API pay-per-use
Gemini (Google) Medium Yes Free / ~500 CZK (Advanced)

The conclusion for the average user is clear: Never take an AI's answer as absolute truth. If you use AI for fact-checking, always demand citations and conduct your own cross-check with trustworthy news sources. AI is a tool for processing information, not guaranteed truth.

Can AI deliberately lie to me in Czech due to propaganda?

Yes, it's possible. If the model was trained on data containing disinformation campaigns (including in Czech), it may treat these narratives as factual data. The model's ability to combat this depends on its safety filters and the quality of its RLHF process.

How can I tell if an AI model is being used for disinformation?

Watch whether the model uses strongly emotional language, whether it repeats unverified claims as fact without citing a source, or whether it avoids complex answers to sensitive political topics. If the model shows a tendency to "favor" one side of a conflict, be cautious.

Is it safer to use paid versions of AI rather than free ones?

Generally yes. Paid versions (like Claude Pro or ChatGPT Plus) often use the latest and most powerful models, which have better logical reasoning capabilities and are more thoroughly tested for safety and ethics than basic free versions.

X

Don't miss out!

Subscribe for the latest news and updates.