News listAnthropic admits Claude has "gotten dumber": three engineering configuration errors, all subscription quotas reset as compensation
動區 BlockTempo2026-04-24 01:41:06

Anthropic admits Claude has "gotten dumber": three engineering configuration errors, all subscription quotas reset as compensation

ORIGINALAnthropic 承認 Claude「真的變笨」:三個工程配置失誤,已重置所有訂閱額度當補償
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1702 words
Recently, the community has continuously reported that Claude's performance has declined. On the 23rd, Anthropic released an incident report, stating that the root cause was not the model itself, but engineering errors at three product layers, which collectively caused global users to perceive a significant drop in quality. (Previous coverage: Anthropic's latest valuation hits "$800 billion," doubling in two months; IPO expected as early as October) (Background supplement: The open-source project badclaude, which speeds up Claude Code, received a copyright infringement notice from Anthropic) Have you also felt that Claude has "become dumber" lately? Some say its reasoning has become shallower, some say it has started to hallucinate, and others say it consumes tokens faster while quality declines. A new term, "AI shrinkflation," has even emerged. Borrowing the concept from consumer goods where "the portion size decreases but the price remains the same," it describes the phenomenon where the output quality of a model quietly shrinks under the same cost. BridgeMind's test figures are more direct: Claude Opus 4.6's accuracy dropped from 83.3% to 68.3%, and its ranking fell from 2nd to 10th. Stella Laurenzo, Senior Director of the AI division at AMD, analyzed 6,852 Claude Code session logs and over 230,000 tool calls on GitHub, also finding that the model's reasoning depth had significantly declined, with a tendency to choose the "simplest fix" rather than the "correct solution." In response, Anthropic officially released an incident report on April 23, acknowledging that the problem indeed exists, but stating that the root cause was not model training, but engineering configurations at three product layers. Anthropic confirmed three independent product-layer changes that collectively caused this decline in quality: First, a downgrade in reasoning effort (March 4). Anthropic lowered the default reasoning effort for Claude Code from "high" to "medium." Reasoning effort is the configuration of "how deep to think" before the model answers each time. The higher the setting, the longer the model spends on internal derivation, but the interface also looks more like it is "stuck." To solve the perception of UI latency, Anthropic chose to lower the default value without fully evaluating the impact on complex tasks. Second, a caching bug (March 26). Engineers designed an optimization logic to clear old thought processes after the model had been idle for more than an hour to save cache space. However, there was a critical error in the implementation: the clearing action was not executed once after idling, but was continuously triggered after every subsequent turn of conversation. As a result, the model constantly lost its "short-term memory," leading to repeated forgetting and repetition in long conversations. Third, system prompt redundancy constraints (March 16). Anthropic added instructions to the backend system prompt, requiring the model to compress text between tool calls to under 25 words and final replies to under 100 words. This measure, originally intended to reduce redundancy in Opus 4.7, accidentally affected Opus 4.6, leading to a 3% decline in code quality assessment. The commonality of these three changes is that they all occurred at the Harness layer (the model execution environment, an engineering shell wrapping the model that determines system prompts, caching logic, etc.) rather than in the model training itself, yet they were sufficient for global users to perceive a significant gap. Anthropic has fixed the caching bug in version v2.1.116 and restored the reasoning effort and redundancy constraint settings. To prevent similar incidents from recurring, Anthropic announced four measures: 1. More internal employees will use the exact same Claude Code as the public version. 2. Every system prompt change will undergo ablation testing (closing one variable at a time to test its independent impact on results). 3. New audit tools will be added to make prompt changes easier to track. 4. All subscription users' usage limits will be reset as compensation. The reason users named this decline "AI shrinkflation" is due to a structural dilemma: the model is a black box. Ordinary users, and even professional developers, cannot distinguish between "model degradation" and "engineering configuration errors." Both have the same impact on the experience but have completely different causes and repair paths. Anthropic initially denied claims of "deliberately weakening" the model, stating that neither the API nor the reasoning layer was affected. However, user dissatisfaction continued to accumulate, and public audit data from high-profile users made this controversy increasingly difficult to avoid. This gap between "official claims that there is no problem" and "data showing there is a problem" is where this incident truly damages trust, rather than the model performance itself. The chasm between fact and perception will
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:221c38dd8d
Source:動區 BlockTempo
Published:2026-04-24 01:41:06
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments