How I Cut LLM API Costs by 88%

TL;DR — Prompt caching, model routing, structured output. Three changes: $3,316/month to $406. None of this is specific to fortune-telling. Any LLM-powered service can use these. One free analysis: $0.085. At 1,000 daily users, that's $2,550/month — for a free tier. Even at a 3% paid conversion rate, revenue couldn't cover the free tier costs. That's not a business. That's a charity. So I tore apart the cost structure. Prompt Caching — Stop Buying the Same Textbook Every Class Every LLM API call sends a "system prompt." The fortune interpretation guidelines, Five Elements rules, output format specs — identical every time, sent from scratch every time. Like buying a new textbook for every lecture. Prompt caching sends this system prompt once, then reuses the cached version. Doesn't change (cache): interpretation guidelines, element rules, output format Changes every time (fresh): user's birth data, engine calculation JSON Claude's cache_control cuts input costs by 90% on cache hits. Gem

How I Cut LLM API Costs by 88%

Related Articles

Week 6 — No New Problems. Just Me and Everything I Already Learned.

What OpenClaw Gets Wrong Out of the Box (And How to Fix It)

Android Remote Compose：讓 Android UI 不用發版也能更新

Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?

“Learn to Code” Is Dead… Learn to Think Instead

Related Articles

How-To
Week 6 — No New Problems. Just Me and Everything I Already Learned.
Medium Programming • 3d ago

How-To
What OpenClaw Gets Wrong Out of the Box (And How to Fix It)
Medium Programming • 3d ago

How-To
Android Remote Compose：讓 Android UI 不用發版也能更新
Medium Programming • 3d ago

How-To
Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?
Lobsters • 3d ago

How-To
“Learn to Code” Is Dead… Learn to Think Instead
Medium Programming • 3d ago