News listClaude Code Cost-Saving Hack: Engineers Save 300 Million Tokens a Week with Caching—The Key is Not to Interrupt
動區 BlockTempo2026-05-24 04:56:39

Claude Code Cost-Saving Hack: Engineers Save 300 Million Tokens a Week with Caching—The Key is Not to Interrupt

ORIGINALClaude Code 省錢密技:工程師一週靠快取省下 3 億 Token,關鍵在別打斷
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯4231 words
Does Claude Code’s long conversation consume your quota? Engineer Nate Herk reveals that he saved 300 million tokens in a week using the caching mechanism, with a daily peak of 91 million. The key is not how much code you write, but how not to "break" the cache, so that repeated context no longer wastes costs. (Previous coverage: The badclaude open-source project that accelerated Claude Code received a copyright infringement notice from Anthropic) (Background supplement: Claude Code adds cloud scheduled task function! No need to keep the computer on, AI automatically reviews PRs and upgrades) Many developers often find that when using Claude Code to write programs, the most headache-inducing thing is that the token usage quota runs out as quickly as flowing water, and long conversations have almost become a luxury. However, influencer Nate Herk, who often shares AI usage tips in the community, revealed in an X post that the real cost killer is actually not the amount of code, but whether the system makes good use of the prompt caching mechanism. He saved more than 300 million tokens in a week through caching, with a daily cache volume as high as 91 million: since the cost of cached tokens is only 10% of ordinary input tokens, this calculation is equivalent to spending only 9 million tokens a day, almost "free" extending the life of the entire programming conversation round. I saved 300 million tokens this week, 91 million in a single day, and more than 300 million in a week. I didn't change any settings. This is just prompt caching working normally in the background. But when I truly understood what caching is and how to avoid "breaking" the cache, my sessions could last longer under the same usage quota. So, here is an 80/20 introductory guide to Claude Code prompt caching, without involving deep details at the API level. The cost of cached tokens is only 10% of ordinary input tokens. 91 million cached tokens are actually billed at approximately 9 million tokens. The cache TTL for the Claude Code subscription version is 1 hour; the API default is 5 minutes; Sub-agents are always 5 minutes. The cache is divided into three layers: system layer, project layer, and conversation layer. Switching models in the middle of a session will destroy the cache, including turning on "opus plan" mode. coding agents need glass boxes now jianshuo/ccglass > 111 stars on github > created yesterday > mit + javascript > local proxy + web dashboard for claude code, codex, deepseek-tui, and kimi > shows the full system prompt, tool schemas, message history, token/cache/cost, and… pic.twitter.com/Wot5SFV16N— Beau Johnson (@BeauJohnson89) May 24, 2026 For every cached token, the cost is 10% of an ordinary input token. So, when my dashboard shows that 91 million tokens hit the cache on a certain day, the actual billing is roughly equivalent to processing 9 million tokens. This is why, compared to no caching, using Claude Code for a long time makes people feel that the session is almost "freely" extended. There are two numbers in the dashboard that deserve special attention: Cache create: The one-time cost incurred when writing content to the cache. It will start to take effect in the next round of conversation. Cache read: Tokens reused by Claude from the cache, such as your CLAUDE.md, tool definitions, previous messages, etc. Compared to being re-processed as input, the cost is 10 times lower. If your Cache read number is high, it means you are effectively using the cache; if this number is very low, it means you are paying repeatedly for the same batch of context. Anthropic's Thariq said something that impressed me deeply: "We actually monitor the hit rate of prompt cache. Once the hit rate is too low, it will trigger an alert, or even declare a SEV-level incident." He also wrote a great X article. When the cache hit rate is high, four things happen at the same time: Claude Code feels faster, Anthropic's service costs drop, your subscription quota seems more durable, and long coding sessions become more realistic. But if the hit rate is very low, everyone loses. So, the incentives for both sides are actually aligned: Anthropic wants your cache hit rate to be higher, and you yourself want the hit rate to be higher. What really holds you back are just some seemingly inconspicuous habits that quietly rebuild the cache. Caching relies on prefix matching. You don't need to get into deep technical details
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:171ed270b1
Source:動區 BlockTempo
Published:2026-05-24 04:56:39
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments