What's semantic caching?

As more applications for generative AI come, its shortcomings become more apparent. One huge problem with LLMs is how expensive each query is, for example take Gemini — Gemini 2.5 Pro charges $1.25 per million input tokens and $10 per million output tokens. Their flagship Gemini 3.1 Pro doubles that to $2 and $12 per million tokens respectively. Even a moderately active app can rack up thousands of dollars a month pretty quickly. Imagine a small customer support bot with just 500 daily users — by month two, the API bill has quietly crossed $2,000. That's not an edge case, that's just what happens when you're not caching. As a business (or a personal user) saving costs where possible and speeding up operations is a huge important factor that decides how well your product does. One way to speed up and minimise costs is to use a simple 'semantic cache'. What it is A semantic cache is not too different from a traditional cache, it has the same idea behind it. Normally a traditional cache s

What's semantic caching?

Related Articles

Voice-integrated development with Kiro.

Happy Monday 😀

Our ‘Quick Win’ Took 6 Months. The ‘Complex Solution’ Took 2 Days

Gleam v1.15.0 released

I Got Tired of Opening 12 Tabs a Day. So I Built the Developer Toolbox I Always Wished Existed.

Related Articles

News
Voice-integrated development with Kiro.
Medium Programming • 12m ago

News
Happy Monday 😀
Dev.to • 19m ago

News
Our ‘Quick Win’ Took 6 Months. The ‘Complex Solution’ Took 2 Days
Medium Programming • 34m ago

News
Gleam v1.15.0 released
Lobsters • 1h ago

News
I Got Tired of Opening 12 Tabs a Day. So I Built the Developer Toolbox I Always Wished Existed.
Medium Programming • 1h ago