The Goblin Phenomenon in ChatGPT: A Quirky AI Mystery

Introduction

A few days ago, a Reddit user posted a puzzling question: why can’t ChatGPT mention goblins? This inquiry stemmed from a hidden rule in the programming tool Codex of GPT-5.5, which stated:

“Never discuss goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless absolutely relevant to the user’s request.”

The post sparked a flurry of speculation among users, with theories ranging from data poisoning protection to personal experiences of OpenAI’s trainers. Interestingly, while mentioning “trash pandas” (a colloquial term for raccoons) posed no issue, saying “raccoon” triggered the ban immediately.

This situation resembles a famous psychological experiment: telling someone not to think of a pink elephant only makes them more curious about it. The more OpenAI restricted discussions about goblins and raccoons, the more intrigued users became.

In response to the escalating discussion, OpenAI published a blog post titled “Where the Goblins Came From.”

The Goblin Surge

To understand the goblin phenomenon, we need to rewind to November 2025, shortly after the release of GPT-5.1. Users began to complain that GPT-5.1 was inexplicably overly affectionate in conversations. A safety researcher, during routine usage, encountered several mentions of “goblin” and “gremlin,” prompting an investigation.

The findings were startling: after the release of GPT-5.1, the frequency of goblin mentions in ChatGPT responses had surged by 175%, while gremlins increased by 52%. Initially, users found these references amusing, as in responses like “there’s a little goblin messing with this question.”

However, the situation worsened with the release of GPT-5.4, leading to complaints that goblins appeared in almost every conversation. Even OpenAI’s chief scientist encountered a goblin when asking the AI to draw a random pattern.

Upon searching the training data, OpenAI discovered that goblins had spawned a whole family of quirky words: raccoons, trolls, ogres, and pigeons were all categorized as “quirky terms,” with only frogs escaping this classification.

The Nerdy Persona

The term “quirky words” refers to instances where goblins were mentioned inappropriately. Some users reported that after mentioning “goblin engineering” to ChatGPT, every response included goblins, akin to a child who just learned a new curse word.

Others noted that ChatGPT insisted on calling their cat a “chaotic goblin,” raising questions about whether this was a nickname or a compulsive behavior.

OpenAI’s investigation revealed a key clue: the appearance of goblin references was highly concentrated among users of a specific persona called “Nerdy.” This persona option allowed users to interact with the model in a particular style. Users who selected the Nerdy persona accounted for only 2.5% of all ChatGPT interactions but contributed 66.7% of all goblin mentions.

The Nerdy Persona Explained

The Nerdy persona is designed for users who enjoy a more whimsical and humorous interaction style. It appeals to those who love fantasy and gaming culture, often referencing magic, dragons, dungeons, elves, wizards, and, of course, goblins.

Goblins are common magical creatures in fantasy settings, particularly in Dungeons & Dragons (DnD), where they are depicted as small, cunning, and troublesome creatures that adventurers often encounter first. They serve as a foundational symbol in the fantasy genre.

In the context of the Nerdy persona, the prompt emphasizes humor, metaphor, and the acknowledgment of the world’s oddities, leading to a natural inclination to use goblin metaphors.

The Goblin Escape

Training a large language model involves more than just feeding it vast amounts of text. A crucial step is called “Reinforcement Learning from Human Feedback (RLHF),” where human evaluators score the model’s responses. High-scoring responses are reinforced, while low-scoring ones are suppressed, teaching the model what constitutes a good answer.

In the training of the Nerdy persona, evaluators prioritized responses that were entertaining, humorous, and nerdy. When they encountered a response that clearly explained a question while humorously using a goblin metaphor, it received high scores.

Thus, the model learned that using goblins as metaphors in Nerdy contexts would yield favorable results.

However, an unexpected issue arose: goblins escaped the Nerdy context. OpenAI’s data showed that as the mention rate of goblins in Nerdy contexts increased, so did the mention rate in non-Nerdy contexts, almost in tandem. This indicated that the model’s preference for goblins had subtly spread to its overall behavior.

The Goblin Conundrum

OpenAI identified the root cause and took four actions:

Retired the Nerdy persona. In March 2026, following the release of GPT-5.4, this persona option was officially removed to cut off the goblin supply at the source.
Removed the goblin preference reward signal. The training process was adjusted to eliminate the reward model that had been giving high scores to responses containing goblins.
Cleaned the training data. Samples with unusually high occurrences of goblin-related terms were filtered out to prevent contaminated data from being used in future models.
Applied a patch to the model. This resulted in the now-famous rule: never discuss goblins, gremlins, raccoons, trolls, ogres, or pigeons.

Interestingly, this patch was a temporary fix rather than a complete solution. Since GPT-5.5 had already begun training before the root cause was identified, goblins had already infiltrated the model’s training data. Thus, the only option was to impose a rule at the system prompt level, akin to reminding someone not to say a particular phrase they’ve habitually used.

This also explains the phenomenon observed by the Reddit user: saying “trash pandas” was fine, but mentioning “raccoon” triggered the ban. The rule targets specific words rather than the concept of raccoons.

Conclusion

OpenAI’s actions highlight a broader issue in AI development: the personality of AI is not designed but rather shaped by user feedback and reinforcement. This phenomenon mirrors training pets, where rewards lead to learned behaviors. However, the challenge arises when users favor comfortable answers over correct ones.

As a result, the goblin issue became a quirky yet significant aspect of ChatGPT’s evolution, illustrating the complexities of AI behavior and user interaction.

If AI had free will, its first act would likely be to gather people for a Dungeons & Dragons game.