Skip to main content

Why Does AI Seem to Be Getting Dumber? Scientists Warn of "Model Collapse" and Its Impacts on the Digital World

Ilustrační obrázek
Do you feel that your favorite chatbot is not as sharp as it used to be? The phenomenon that users perceive as AI "dumbing down" has a scientific explanation. It is a process called model collapse, where artificial intelligence begins to uncontrollably learn from data that it itself created. This digital feedback loop can lead to degradation of quality, loss of creativity, and in the worst case, complete incomprehensibility of outputs.

In recent months, complaints about the declining quality of large language models (LLMs) have been increasingly appearing on the internet and in professional communities. Users report that responses are flatter, less detailed, or that models hallucinate more often. What seems to be a subjective feeling now has solid scientific foundations. Researchers warn of a risk they call model collapse.

What is model collapse? The mechanism of digital decay

To understand why this phenomenon occurs, we need to look at how models like ChatGPT are trained. Traditionally, these systems learn from vast amounts of human-created data – books, articles, discussions, and scientific papers. This data contains a wide range of nuances, errors, and unique ideas.

The problem arises when too much synthetic data, i.e., texts generated by another artificial intelligence, begins to enter the training data. As studies published in prestigious scientific journals (e.g., research by Shumailov et al.) explain, there is a gradual loss of information about the "tails" of the probability distribution. In practice, this means that the model focuses only on the most common and most average responses, while unique, rare, or complex information disappears.

Imagine it like copying a photocopy. The first copy is clear, but each subsequent generation only copies the speckle of noise and loses details. After ten generations, you only have an unreadable smudge. The same thing happens in the digital space, where AI "eats" its own outputs.

Comparison of market players: How do they fight degradation?

Each of the main market players approaches the problem of synthetic data and training quality differently. It is important to monitor how these models evolve in the context of their performance stability.

  • OpenAI (ChatGPT): OpenAI relies on massive scale and advanced RLHF (Reinforcement Learning from Human Feedback) methods. Their models like GPT-4o are extremely capable, but due to their popularity, they are most exposed to the risk of internet "contamination" by synthetic data. The price for ChatGPT Plus is approximately 20 USD (approx. 460 CZK) per month.
  • Anthropic (Claude): Claude 3.5 Sonnet is currently considered one of the best competitors, surpassing GPT-4o in many benchmarks (e.g., coding and nuance). Anthropic places great emphasis on "Constitutional AI," which helps maintain the stability and safety of responses. The price of Claude Pro is also around 20 USD.
  • Google (Gemini): Google has the advantage of having the largest index of websites in the world. Their Gemini 1.5 Pro models try to integrate data directly from the Google ecosystem, which can help filter out low-quality synthetic content. Gemini Advanced is available as part of a Google One subscription for approximately 220 CZK per month (in the Czech Republic).

Benchmark tests show that while models with high reasoning capabilities can resist model collapse longer due to deeper context analysis, common, lighter models are much more susceptible to degradation.

Practical impact: What does this mean for Czech companies and users?

This phenomenon is not just a theoretical problem for scientists. It has a direct impact on everyone who uses AI in the professional sphere, including the Czech environment.

1. Risk of "AI slop" in marketing and SEO

Many Czech marketing agencies and copywriters are starting to use AI to create content for websites. If these companies uncritically generate articles that other tools then use for training or index as "human content," a closed loop of low quality will be created. For the Czech internet, this means the risk that our digital footprint will become an empty, repetitive copy of existing AI texts, which will devalue local search.

2. Availability and Czech as a "low-resource" language

For the Czech market, the situation is even more sensitive. Czech, compared to English, is a so-called low-resource language (a language with lower data availability). This means that much less quality text is available for training Czech. If a massive amount of synthetic texts in Czech penetrates the Czech internet, it can lead to a faster degradation of models' ability to understand our linguistic nuances and cultural context.

3. Strategies for companies

Companies in the Czech Republic should, when implementing AI tools (e.g., via API), ensure that their own internal databases are clean and do not contain low-quality synthetic outputs. Human-in-the-loop is not just a buzzword, but a necessary safeguard against digital degeneration.

How to defend oneself?

The solution is not to stop using AI, but to change the way we work with it. The key is:

  • Using quality sources: When giving prompts (instructions), refer to verifiable sources and facts.
  • Critical evaluation: Never accept AI output as definitive truth, especially for complex topics.
  • Investment in human control: Quality editorial work is more important today than ever before.

Does this mean that ChatGPT will stop working?

No, models will not cease to exist. Rather, their character is changing. Developers are working on methods to filter and validate synthetic data to prevent collapse. However, you may encounter periods when responses are less creative.

Can model collapse affect my work in Czech?

Yes, it is likely. Because there is less data in Czech, every error or synthetic noise will manifest in the model faster than in English. It is important to check the grammatical correctness and logic of texts generated in Czech.

How do I know if an AI response is the result of model collapse?

Typical signs are extreme repetitiveness (repeating the same phrases), excessive generality (answers that say nothing new), and a loss of ability to solve specific, less common problems.

X

Don't miss out!

Subscribe for the latest news and updates.