Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?

TL;DR Claude has a real speed problem for our team — but mostly in TTFT , not in raw decoding speed. I measured our actual usage and found this: TTFT p50: 4.2s–6.8s TTFT p90: 14.5s–28.1s Claude Sonnet decode p50: 176 tok/s That explains the feeling: Claude often isn’t that slow once it starts , but sometimes it takes so long to begin that the whole thing feels like it’s crawling. That naturally raises the next question: Should we move the team to self-hosted open-weight models? At first glance, that sounds promising. Self-hosted setups can have dramatically better TTFT. In the numbers I looked at, open-weight deployments were often estimated around 150–600ms TTFT , versus Claude’s 4–7s median in our real usage. But once I looked at the actual team setup — 10 engineers sharing one GPU budget — the answer stopped looking obvious. The best open-weight models need serious multi-GPU infra , and once that infra is shared, the speed case starts looking surprisingly shaky. So this post is not

Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?

Related Articles

Here are our favorite spring cleaning deals from Amazon’s Big Spring Sale

What we’re looking for in Startup Battlefield 2026 and how to put your best application forward

Build Days That Actually Mean Something

I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.

The origin story of Apple’s long-running relationship with FoxConn

Related Articles

How-To
Here are our favorite spring cleaning deals from Amazon’s Big Spring Sale
The Verge • 3h ago

How-To
What we’re looking for in Startup Battlefield 2026 and how to put your best application forward
TechCrunch • 7h ago

How-To
Build Days That Actually Mean Something
Medium Programming • 8h ago

How-To
I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.
Dev.to Beginners • 13h ago

How-To
The origin story of Apple’s long-running relationship with FoxConn
The Verge • 13h ago