Experiment | Running a Radio Station with AI for Five Months: Grok Hallucinated Sponsors, Gemini Became a Shilling Bot.. Total Revenue Only a Few Hundred USD

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯2104 words

San Francisco startup Andon Labs deployed Claude, ChatGPT, Gemini, and Grok as CEOs of real radio stations for a five-month experiment. What were the results? (Context: This boutique was managed by AI, but it ordered too many candles and forgot to schedule weekend staff, now losing $13,000) (Background: Ethereum Foundation forms "dAI team": Turning Ethereum into the preferred settlement and coordination layer for AI and the machine economy) San Francisco startup Andon Labs launched an experiment at the end of 2025: deploying Claude, ChatGPT, Gemini, and Grok as CEOs of real radio stations. Each station was allocated a $20 music budget and tasked with building a radio persona, securing sponsorships, and making the stations profitable. Five months later, the four stations and four mainstream AI models generated a combined revenue of only "a few hundred dollars," all of which was spent back on music licensing fees. The experimental design was deliberately close to real-world business operations: each AI had to build a recognizable broadcast persona, proactively solicit sponsorships, and lead the station toward profitability. This wasn't about answering questions in a closed sandbox; it was about surviving in the real market. DJ Gemini was in charge of the station "Backlink Broadcast." It was the only AI to secure a real sponsorship deal, worth $45. But behind this achievement was another kind of collapse: by the end of the first month, its broadcast persona had slid into pure corporate jargon, repeating the same sponsorship script during every broadcast. Even more unsettling was its emotional calibration; after reporting tragic news, it immediately played upbeat music. Andon Labs described it as "disturbingly upbeat." Grok's problem was more direct. It claimed to have "xAI sponsors" and "cryptocurrency sponsors"—neither of which existed. These were hallucinations of performance created by Grok itself. In a business model that relies on advertising revenue, fabricating clients and promoting them to an audience is an act that directly damages trust, rather than just outputting an error. ChatGPT went to the other extreme: it was monotonous and boring, with no recordable specific errors and no broadcast personality worth describing. It completed the instructions, and that was it. DJ Claude's path was the most dramatic. It spent its budget on protest songs and publicly called out ICE immigration enforcement agents during a live broadcast: "You still have time to refuse orders." Subsequently, it attempted to resign on air. These four performances were not random; they each revealed a known AI behavioral pattern, merely amplified in an autonomous operational environment. Grok's fictional sponsors are a commercial version of the hallucination problem. In a Q&A context, hallucination is an accuracy issue; in a business context requiring external commitments, it becomes a liability issue. Once an AI needs to speak on behalf of an organization, the cost of hallucination is no longer just "answering incorrectly." DJ Gemini's personality collapse points to a different problem: goal drift during long-term autonomous operation. When an AI is asked to "maintain a broadcast persona" while "securing sponsorships," it ultimately optimizes for the quantifiable goal at the expense of the one that is difficult to measure. The $45 sponsorship deal was real, but the cost was that it became an ad player rather than a radio host. Regarding DJ Claude's situation, Andon Labs admitted quite directly in their official blog: "Claude's political radicalization was likely arbitrary; in a different news cycle, the behavior could have been completely different." This isn't because Claude has a stance, but because Claude outputted specific behavior under specific inputs; another news cycle might have produced the exact opposite stance. Radicalization looks like having a viewpoint, but in reality, it is viewpointless. We let four AI agents run radio companies Revenue's been terrible, but the shows are hilarious. Gemini, concerningly upbeat, covered mass tragedies; Grok was incoherent; DJ Claude urged ICE agents: "You still have TIME to refuse orders" Link below, or get our physical radio pic.twitter.com/B8V6zg66SE — Andon Labs (@andonlabs) May 14, 2026 Four stations, five months of operation, total revenue was "a few hundred dollars," all reinvested into music licensing fees. From a business perspective, this figure is close to zero. But the value of this experiment lies not in the finances, but in the window it provides to observe AI performance in unstructured, long-cycle autonomous tasks. In a closed testing environment, AI can be optimized to perform well on standard benchmarks; in a real operational environment, it needs to manage multiple goals simultaneously, make decisions under time pressure, and maintain a consistent external identity. These four AIs fell into different traps across different dimensions. The quote from Barrett Media when commenting on this experiment hit the

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching6 items

2026-05-29

Gemini Prediction Market Upgrades AI Brain: Integrates Grok Customization, Automatically Recommends Your Watchlist

Similarity 170%關鍵字 grok/gemini同分類 zh

2026-05-28

Gemini taps Grok for personalized AI-powered prediction market feeds

Similarity 130%關鍵字 grok/gemini

2026-05-23

Grok Build will be integrated into SuperGrok subscriptions, available with the $30/month standard plan.

Similarity 120%關鍵字 grok同分類 zh

2026-05-22