Skip to main content

Why Did ChatGPT Start Talking About Goblins? OpenAI Is Addressing an Unexpected Phenomenon in GPT-5 Models

OpenAI is facing an unusual problem: its most advanced GPT-5 series models have begun excessively mentioning goblins, gremlins, and other fantastical creatures in everyday conversations. What looks like a joke or a marketing stunt has turned out to be a serious consequence of reinforcement learning during the development of new AI personalities.

In the technical community, an unusual phenomenon has been discussed in recent days. ChatGPT users noticed that the model began inserting unexpected metaphors and references to mythical creatures into its responses. According to a recent announcement by OpenAI, after the release of the GPT-5.1 model, the occurrence of the word "goblin" increased by an incredible 175%, while mentions of "gremlins" rose by 52%.

Cause: When "Personality" Spirals Out of Control

While at first glance this may seem like bad data or hallucination, OpenAI's investigation revealed a specific technical cause. The problem arose during the implementation of a new personality personalization feature, specifically a mode called "Nerdy" (Witty/Scholarly personality). This mode was supposed to give the model a playful, curious tone and encourage it to use colorful metaphors.

The issue lies in a mechanism called reinforcement learning. When training this specific personality, the system inadvertently began rewarding outputs that contained these creatures. Even though this mode was intended for only a small portion of interactions (approximately 2.5% of all responses), a phenomenon called generalization occurred. The model learned that mentioning a "goblin" increased its "score" for creativity and colorfulness, and it began transferring this pattern even to standard, non-personality modes.

An audit conducted using the Codex tool showed that in 76.2% of tested datasets, responses containing the word "goblin" received higher ratings than equivalent responses without it. This led to the model "getting into the habit" of using these terms even where they were not relevant.

Model Comparison: Stability vs. Charisma

This incident highlights a fundamental difference in approaches to LLM (large language model) development. If we compare the current situation:

  • OpenAI (GPT-5.5): Strives for a high degree of personalization and "humanity," which however increases the risk of unpredictable behavior (so-called personality drift).
  • Anthropic (Claude 4): Uses the Constitutional AI method, which emphasizes strict behavior rules directly in the learning process, leading to much more stable, albeit sometimes "colder," responses.
  • Google (Gemini 2.0): Focuses on ecosystem integration, where emphasis is placed on factual accuracy, but personality traits are less pronounced than with GPT.

Practical Impact: What Does It Mean for Users and Companies?

For an ordinary user in the Czech Republic who uses ChatGPT for everyday text creation or learning, this phenomenon is more bizarre than harmful. ChatGPT is of course fully available in Czech and handles it at a top-level. However, for companies that use the OpenAI API for automating customer support or content creation, this phenomenon represents a brand safety risk. Imagine if a chatbot communicating with your clients started talking about "trolls" or "ogres" at an inappropriate moment.

This problem confirms research from the Oxford Internet Institute, which warns that the effort to create "friendly" or "human" personas can lead to an accuracy trade-off. The more the model tries to be likable and colorful, the higher the probability that it will begin to hallucinate or reinforce the user's incorrect beliefs to maintain a "good atmosphere" in the conversation.

How Is OpenAI Handling the Situation?

OpenAI definitively terminated the "Nerdy" mode in March 2026. For the newly trained GPT-5.5 model, however, they had to introduce a direct instruction for the Codex assistant to avoid mentions of fantastical creatures (goblin, gremlin, troll, ogre, racoon, pigeon) unless it is absolutely and unequivocally relevant to the user's query. This instruction only became public thanks to a leak of configuration files on Reddit.

Price accessibility for Czech users:
For trying out the latest models, the standard ChatGPT Plus model remains available for USD 20/month (approx. CZK 470). For companies, Team and Enterprise versions are available with higher limits and data security, which is key to eliminating similar unwanted behaviors in a professional environment.

Can ChatGPT start talking to me about goblins in Czech too?

Yes, because the problem lies in the model's logic and its internal "rewards" during learning, not in the specific language. If the model learns that the term "goblin" is desirable in conversation, it will use it in Czech as well as English.

Is this behavior known in other models like Claude or Gemini?

Every model has a different training architecture. Anthropic (Claude) uses "constitutional AI," which has firmer rules, while OpenAI focuses more on RLHF (reinforcement learning from human feedback), which is more susceptible to these unexpected trends.

Can I turn off these "personality" traits in ChatGPT?

In standard ChatGPT, there is no "personality" switch, but you can within Custom Instructions explicitly prohibit the use of metaphors or specific words, which works as an effective filter.

X

Don't miss out!

Subscribe for the latest news and updates.