FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources
  • Privacy Policy

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?
How-ToDevOps

Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?

via Dev.toAviad Rozenhek4h ago

TL;DR Claude has a real speed problem for our team — but mostly in TTFT , not in raw decoding speed. I measured our actual usage and found this: TTFT p50: 4.2s–6.8s TTFT p90: 14.5s–28.1s Claude Sonnet decode p50: 176 tok/s That explains the feeling: Claude often isn’t that slow once it starts , but sometimes it takes so long to begin that the whole thing feels like it’s crawling. That naturally raises the next question: Should we move the team to self-hosted open-weight models? At first glance, that sounds promising. Self-hosted setups can have dramatically better TTFT. In the numbers I looked at, open-weight deployments were often estimated around 150–600ms TTFT , versus Claude’s 4–7s median in our real usage. But once I looked at the actual team setup — 10 engineers sharing one GPU budget — the answer stopped looking obvious. The best open-weight models need serious multi-GPU infra , and once that infra is shared, the speed case starts looking surprisingly shaky. So this post is not

Continue reading on Dev.to

Opens in a new tab

Read Full Article
5 views

Related Articles

Here are our favorite spring cleaning deals from Amazon’s Big Spring Sale
How-To

Here are our favorite spring cleaning deals from Amazon’s Big Spring Sale

The Verge • 3h ago

What we’re looking for in Startup Battlefield 2026 and how to put your best application forward
How-To

What we’re looking for in Startup Battlefield 2026 and how to put your best application forward

TechCrunch • 7h ago

Build Days That Actually Mean Something
How-To

Build Days That Actually Mean Something

Medium Programming • 8h ago

I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.
How-To

I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.

Dev.to Beginners • 13h ago

The origin story of Apple’s long-running relationship with FoxConn
How-To

The origin story of Apple’s long-running relationship with FoxConn

The Verge • 13h ago

Discover More Articles