
Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context
Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context If you're sending the same large context (system prompt, documents, tool definitions) with every API call, you're paying full price each time. Prompt caching stores that context once and charges 10x less on subsequent calls. How It Works Normal API call: full input tokens billed every request. With caching: First call: input tokens billed + small cache write fee Subsequent calls: only a cache read fee (90% cheaper than full tokens) Marking Content for Caching import anthropic client = anthropic . Anthropic () # Large document you send on every request SYSTEM_DOCS = open ( ' large_codebase_context.txt ' ). read () # 50k tokens response = client . messages . create ( model = ' claude-opus-4-6 ' , max_tokens = 1024 , system = [ { ' type ' : ' text ' , ' text ' : SYSTEM_DOCS , ' cache_control ' : { ' type ' : ' ephemeral ' } # Cache this } ], messages = [{ ' role ' : ' user ' , ' content ' : ' What does the auth module do
Continue reading on Dev.to
Opens in a new tab



