Claude Code Cost-Saving Hack: Engineers Save 300 Million Tokens a Week with Caching—The Key is Not to Interrupt

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯4231 words

Does Claude Code’s long conversation consume your quota? Engineer Nate Herk reveals that he saved 300 million tokens in a week using the caching mechanism, with a daily peak of 91 million. The key is not how much code you write, but how not to "break" the cache, so that repeated context no longer wastes costs. (Previous coverage: The badclaude open-source project that accelerated Claude Code received a copyright infringement notice from Anthropic) (Background supplement: Claude Code adds cloud scheduled task function! No need to keep the computer on, AI automatically reviews PRs and upgrades) Many developers often find that when using Claude Code to write programs, the most headache-inducing thing is that the token usage quota runs out as quickly as flowing water, and long conversations have almost become a luxury. However, influencer Nate Herk, who often shares AI usage tips in the community, revealed in an X post that the real cost killer is actually not the amount of code, but whether the system makes good use of the prompt caching mechanism. He saved more than 300 million tokens in a week through caching, with a daily cache volume as high as 91 million: since the cost of cached tokens is only 10% of ordinary input tokens, this calculation is equivalent to spending only 9 million tokens a day, almost "free" extending the life of the entire programming conversation round. I saved 300 million tokens this week, 91 million in a single day, and more than 300 million in a week. I didn't change any settings. This is just prompt caching working normally in the background. But when I truly understood what caching is and how to avoid "breaking" the cache, my sessions could last longer under the same usage quota. So, here is an 80/20 introductory guide to Claude Code prompt caching, without involving deep details at the API level. The cost of cached tokens is only 10% of ordinary input tokens. 91 million cached tokens are actually billed at approximately 9 million tokens. The cache TTL for the Claude Code subscription version is 1 hour; the API default is 5 minutes; Sub-agents are always 5 minutes. The cache is divided into three layers: system layer, project layer, and conversation layer. Switching models in the middle of a session will destroy the cache, including turning on "opus plan" mode. coding agents need glass boxes now jianshuo/ccglass > 111 stars on github > created yesterday > mit + javascript > local proxy + web dashboard for claude code, codex, deepseek-tui, and kimi > shows the full system prompt, tool schemas, message history, token/cache/cost, and… pic.twitter.com/Wot5SFV16N— Beau Johnson (@BeauJohnson89) May 24, 2026 For every cached token, the cost is 10% of an ordinary input token. So, when my dashboard shows that 91 million tokens hit the cache on a certain day, the actual billing is roughly equivalent to processing 9 million tokens. This is why, compared to no caching, using Claude Code for a long time makes people feel that the session is almost "freely" extended. There are two numbers in the dashboard that deserve special attention: Cache create: The one-time cost incurred when writing content to the cache. It will start to take effect in the next round of conversation. Cache read: Tokens reused by Claude from the cache, such as your CLAUDE.md, tool definitions, previous messages, etc. Compared to being re-processed as input, the cost is 10 times lower. If your Cache read number is high, it means you are effectively using the cache; if this number is very low, it means you are paying repeatedly for the same batch of context. Anthropic's Thariq said something that impressed me deeply: "We actually monitor the hit rate of prompt cache. Once the hit rate is too low, it will trigger an alert, or even declare a SEV-level incident." He also wrote a great X article. When the cache hit rate is high, four things happen at the same time: Claude Code feels faster, Anthropic's service costs drop, your subscription quota seems more durable, and long coding sessions become more realistic. But if the hit rate is very low, everyone loses. So, the incentives for both sides are actually aligned: Anthropic wants your cache hit rate to be higher, and you yourself want the hit rate to be higher. What really holds you back are just some seemingly inconspicuous habits that quietly rebuild the cache. Caching relies on prefix matching. You don't need to get into deep technical details

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching6 items

2026-05-26

Two months after Claude Code burned through Uber's annual budget, the COO bluntly stated: There is no proportional relationship between Token consumption and useful output.

2026-05-28

Anthropic launches its latest model, Claude Opus 4.8! Claude Code simultaneously adds "Dynamic Workflows" — one person can match a hundred-person development team

Similarity 170%關鍵字 code/claude同分類 zh

2026-05-24

My 30-person company went all-in on Claude Code, and the result is "humans have more work to do than ever."

Similarity 170%關鍵字 code/claude同分類 zh

2026-05-22

Andrej Karpathy's distillation of the "CLAUDE.md Four Principles" goes viral on GitHub, boosting AI coding accuracy to over 90%