Skip to main content

Training Data Workers Use ChatGPT Instead of Their Own Work — AI Model Collapse Threatens

Abstract AI neural network visualization
Hundreds of thousands of people around the world make a living by writing texts, evaluating answers, or annotating data for training large language models like ChatGPT or Gemini. However, researchers have uncovered a disturbing trend: many of these "human evaluators" themselves use ChatGPT to complete their work faster. This creates a dangerous loop — AI is being trained on its own outputs. Scientists call this phenomenon model collapse, and warn that it could permanently damage the entire next generation of AI systems.

When "Human Data" Isn't Human

The entire modern industry around large language models rests on one key assumption: that the data used to train these models comes from real people. Whether it's evaluating the correctness of answers, writing sample texts, or annotating images — all of it is supposed to be done by humans, not machines. But the reality is different.

Researchers from École Polytechnique Fédérale de Lausanne (EPFL) conducted a study that revealed this with striking accuracy. They hired 44 workers through the Amazon Mechanical Turk platform — one of the world's largest platforms for outsourcing small digital tasks — and assigned them a relatively simple task: to summarize scientific abstracts from the field of medicine into approximately 100 words.

The results were shocking. Of the 46 submitted summaries, 21 showed a probability higher than 50% of being generated by ChatGPT. For 15 summaries, this probability even exceeded 98%. And 41 out of 46 submissions involved copy-pasting operations — a signal that the text was transferred from elsewhere, not written from scratch.

The Economic Logic of the "Shortcut"

To be clear: workers on Amazon Mechanical Turk or similar platforms are not fraudsters. They are people who react entirely rationally to the economic conditions in which they work. The average payment per task on MTurk is in the order of cents. The effective hourly wage, according to various surveys, ranges between 1 and 3 dollars — well below the minimum wage in most countries.

Meanwhile, ChatGPT can write a medical abstract in seconds. For a worker who would otherwise spend 10 minutes writing one paragraph for 50 cents, using AI tools is a purely economic decision. It's not about laziness — it's about survival in a system that pays fractions of what the work would actually be worth.

The overall prevalence of this phenomenon is alarming. Various studies estimate that 33 to 48% of MTurk workers currently use ChatGPT or other AI tools when completing tasks. In certain task categories — especially writing and summarization — this number is likely even higher.

What is Model Collapse and Why Does It Matter

The scientific basis of the problem was described by a team of researchers in 2023 in a paper with the telling title "The Curse of Recursion: Training on Generated Data Makes Models Forget". Authors Ilia Shumailov, Zakhar Shumaylov, and Yiren Zhao showed what happens when AI models are trained on the outputs of other AI models.

The result? Gradual and irreversible degradation of quality. Each generation of a model trained on synthetic data produces worse outputs than the previous one. Errors accumulate, output diversity decreases, and models lose the ability to capture rare but important patterns in the data. Mathematically, this is due to a phenomenon called data distribution drift — statistical shifts in data that amplify across generations.

The practical implications are serious:

  • Incoherent texts — model outputs become repetitive and factually inaccurate
  • Amplification of biases — existing biases are magnified in each generation
  • Loss of context — models cease to understand current events and cultural context
  • Systemic threat — if the problem spreads, the entire AI ecosystem risks degradation

Furthermore, research by the Data Integrity Consortium showed that datasets contaminated with AI content reduced model performance by up to 38% in sentiment analysis tasks.

RLHF: Where Human Feedback Replaced AI

The situation is particularly critical in the area of RLHF (Reinforcement Learning from Human Feedback). This method is at the core of why modern chatbots are so good at conversation: people evaluate AI responses, and the model learns to prefer those perceived as better.

However, if this evaluation is performed by another AI model — or if a human evaluator simply copies a ChatGPT response — the entire system collapses. The model is essentially trained on its own outputs, just with an extra step. VentureBeat pointed out that this feedback loop is one of the most serious structural problems facing the current AI industry.

How Platforms and Researchers Are Defending Themselves

Some platforms are reacting to the problem. British Prolific, which specializes in academic research, explicitly prohibits the use of language models when completing tasks and informs workers of this. Amazon Mechanical Turk has no such rules.

EPFL researchers used a combination of two methods for detection: a classifier trained to recognize AI texts and keystroke tracking. If a worker genuinely types the text, the keystroke pattern is characteristically human — with pauses, corrections, and an irregular pace. If the content is copied, the typical pattern is absent.

This approach suggests how the industry could defend itself: a combination of technical detection and better conditions for human workers. Because if wages remain at $1–3 per hour, no amount of monitoring will solve the problem — the economic logic of using AI will always be stronger than the threat of penalties.

A Threat to the Entire Internet and Future Models

The problem extends beyond data annotation platforms themselves. By 2026, AI-generated content will constitute a significant portion of all text on the internet — from blogs and product descriptions to social media comments. Models trained on web data from 2024–2026 will thus inevitably absorb the outputs of GPT-4, Claude, Gemini, and other models.

The research group Epoch AI estimates that the supply of high-quality, human-written data will essentially be exhausted somewhere between 2026 and 2032. If the purity of training data is not ensured — and if workers on annotation platforms continue to write using AI — this crisis will occur sooner than anyone expected.

Futurism summarized it aptly: ChatGPT has already polluted the internet to such an extent that it threatens the development of future generations of AI. This is not a catastrophic vision of a distant future — it is a measurable, documented trend that is already manifesting today.

What This Means for Czechia and Europe

For Czech users and companies, this situation has a direct impact. The models you work with — be it ChatGPT, Gemini, Claude, or local alternatives — are trained on data whose purity is gradually deteriorating. This can manifest as slower improvement in answer quality, more hallucinations, or poorer performance in less frequently used languages, such as Czech.

Meanwhile, the European Union, through the EU AI Act, which came into force in 2024, sets requirements for transparency and documentation of training data for powerful AI systems. If companies can demonstrate the origin and purity of their data, they will gain a competitive advantage — also because regulators will increasingly push for this aspect.

An alternative to cheap crowdsourced work is to invest in specialized annotation companies that offer higher wages and stricter quality control. Higher data costs can paradoxically lead to better models — and ultimately to greater trustworthiness of AI products on the market.

How can I tell if the AI model I'm using has been trained on contaminated data?

It's not directly recognizable — AI companies rarely disclose details about the composition of their training data. Indirect signs of degradation can include an increased rate of hallucinations, repetitive phrasing, or a loss of ability to distinguish finer nuances. The best protection is to use models from manufacturers who transparently communicate about the origin of their data and have established quality control processes.

Why don't AI companies implement stricter controls for workers who annotate data?

It's primarily an economic problem. Stricter monitoring is technically possible (keystroke tracking, detection classifiers), but without a corresponding increase in compensation, there will always be an economic motivation to circumvent the rules. The real solution would be to pay annotators a fair wage — but this would significantly increase the cost of developing AI models.

Are there AI models that are resistant to model collapse?

Research from May 2026 suggests that even a small amount of authentic, human-written data can significantly slow down or stop model collapse. Models that actively filter synthetic content from training sets and combine web data with curated, verified sources are more resilient. However, no model is completely immune if data contamination reaches a systemic level.

X

Don't miss out!

Subscribe for the latest news and updates.