Critical Thinking vs. Artificial Intelligence: How Not to Fall for ChatGPT, Claude, or Gemini

June 11, 2026 Daniel Cesak

Ever since ChatGPT burst into our lives a few years ago, large language models have become an everyday helper for millions of people. Students use them to write term papers, developers debug code, marketers generate copy, and curious minds get quantum physics explained over breakfast. But there's a catch: none of these models are infallible — and the more we blindly trust them, the greater the risk we take. Although GPT-5.5 Instant hallucinates about half as much according to OpenAI, no algorithm can replace critical thinking.

What is a large language model and why it sometimes makes things up

A large language model (LLM) is a computer system trained on enormous amounts of text data that communicates in natural language. It can answer questions, write texts, translate, or analyze documents. It works by searching its internal knowledge database and, when needed, current web sources.

And that's where the problem starts. ChatGPT, Claude, Gemini, and Copilot are not omniscient. Their internal knowledge base has its limits — ChatGPT, by its own admission, last updated its dataset in August 2025. The same goes for Anthropic's Claude. Anything that happened after that date, the model has to look up live on the web. And according to some experts, online search has paradoxically made answer quality worse — because models have started drawing from unverified or misleading internet sources as well.

Hallucinations: When AI makes things up and sounds convincing

The term "hallucination" has caught on in the AI world for situations where a model generates information that sounds credible but isn't true. These aren't minor inaccuracies — models can produce non-existent studies, invent historical events, or cite laws that were never passed. And they do it with such confidence that even an experienced professional may believe their output.

OpenAI is fighting this problem. Their latest model GPT-5.5 Instant, which has been serving as the default model for all ChatGPT users since May 2026, produced 52.5% fewer hallucinated claims than its predecessor GPT-5.3 Instant in internal tests. In fields like medicine, law, or finance — where factual errors can have serious consequences — this is a significant step forward. For conversations that users themselves flagged as factually incorrect, the number of inaccuracies dropped by 37.3%. The data was published by OpenAI in the GPT-5.5 Instant system card.

Yet even these numbers don't mean the end of hallucinations. The model still makes mistakes — just a bit less often. And more importantly: when it does make mistakes, it does so with the same confidence as before.

Where LLMs are worth using — and where they fail

Experienced users today know well that large language models excel in areas with verified, unchanging facts. They handle history (up to August 2025), explaining scientific concepts, generating template-based texts, or analyzing documents very well. A joke has even spread in the IT community that LLMs are great with hardware and software a couple of years old — but with the latest products, they often give nonsense advice.

That makes sense: if a manufacturer released an update last month, the model simply doesn't have it in its training dataset. It then pieces together an answer from fragments of web discussions, unverified forum posts, and — in the worst case — just makes it up.

IT expert John Agsalud from the Honolulu Star-Advertiser summed it up succinctly: "Just because a language model said it doesn't mean it's true." And he adds a crucial piece of advice: before accepting an LLM's output as fact, engage your critical thinking.

ChatGPT, Claude, Gemini, Copilot: Each has its strengths

The four main players on the market — ChatGPT by OpenAI, Claude by Anthropic, Gemini by Google, and Copilot by Microsoft — are not interchangeable. Each has different strengths:

ChatGPT is the clear leader in user numbers. Its strength lies in versatility and the breadth of its ecosystem. It handles Czech decently; in recent versions, it has improved significantly in grammar and stylistics as well.
Claude scores well in programming, logical reasoning, and working with long documents. It's gaining popularity among developers thanks to the Claude Code tool, though its operating costs are a subject of debate.
Gemini benefits from integration with the Google ecosystem — it works seamlessly with Gmail, Drive, or Calendar. For users immersed in Google services, it's a natural choice.
Copilot excels in integration with Microsoft 365. Pulling data from Outlook, Word, or Excel and presenting it in a more comprehensible form is its domain. For companies using the Microsoft ecosystem, it's the easiest path to an AI assistant.

How much it costs and what the free versions can do

All four platforms offer free versions that are more than sufficient for occasional use. The limitation mainly concerns the number of messages per day — if you hit the limit, the model simply asks you to wait (or pay). Paid personal versions typically cost around $20 per month (roughly 450 CZK), while business plans are more expensive and offer higher limits, better models, and advanced features. OpenAI has also launched ChatGPT Go — the cheapest subscription at about 120 CZK per month, targeting developing markets and students.

Good news for Czech users: all of the mentioned tools understand Czech and can also respond in Czech. The quality of Czech-language responses improved significantly across all models in 2026, though you may still occasionally encounter minor grammatical errors or awkward phrasing.

The EU AI Act and the right to know you're talking to a machine

The European Union is responding to the problem of trust in AI outputs with legislation. The AI Act, which came into force in August 2024 and is gradually taking effect, establishes, among other things, a transparency obligation: users must be informed that they are communicating with artificial intelligence. For LLM providers, this means that model outputs must be clearly labeled as artificially generated. The goal is precisely what experts are talking about — preventing situations where people mistake AI output for verified fact.

For Czech companies and institutions that use LLMs (or are considering deploying them), this brings a clear obligation: implement internal processes for verifying AI outputs, especially when used in areas that impact customers — from customer support to financial advisory.

How to work with LLMs smartly: Practical tips

Summarized into a few rules that will save you trouble:

Verify key facts. If you get a specific number, date, or citation from a model, look it up in an independent source before using it.
Ask about static knowledge. Questions like "What is photosynthesis?" will yield much more reliable answers than "What's the latest version of iOS and what's broken in it?"
Don't rely on an LLM as your only source. Use them as a starting point for research, not as a final authority. This goes double for students and researchers.
Critical thinking is still up to you. The model doesn't distinguish between truth and fabrication — you have to do that. John Agsalud puts it bluntly: "The same applies as for the internet in general. Just because something is written doesn't mean it's true."
Leverage each model's strengths. Need to analyze company documents in Microsoft 365? Reach for Copilot. Working within the Google ecosystem? Give Gemini a chance. Programming or solving logic problems? Claude tends to be more accurate. Want an all-purpose helper for any situation? ChatGPT is still the most versatile choice.

Does ChatGPT really hallucinate that much? Isn't it exaggerated?

Hallucinations are a real and well-documented problem. In its system card for the GPT-5.5 Instant model, OpenAI admits that despite a 52.5% improvement, the model still produces false claims. In previous versions, hallucination rates in demanding fields like law or medicine reached tens of percent. Newer models are better, but none of them guarantee 100% reliability.

Which AI model is best for the Czech language?

All major models (ChatGPT, Claude, Gemini, Copilot) understand Czech and can respond in Czech. In 2026, the quality of Czech responses has improved significantly across all of them. ChatGPT and Claude are comparable in Czech, while Gemini occasionally lags in natural phrasing. For everyday communication and work in Czech, any of them will suffice — it depends more on which ecosystem (Google, Microsoft) you use for your other work.

Do I have to pay for a subscription, or is the free version enough?

For most regular users, the free version is more than sufficient. The limitation mainly concerns the daily message count — but unless you use an LLM for several hours a day, you likely won't exhaust the limit. Paid plans (typically 450 CZK/month) are worth it for professionals who work intensively with AI, need the latest models, or require advanced features — such as uploading large files, access to premium models, or longer context windows.