
Token0 v0.2.0: Streaming Support + Updated Benchmarks : 35-42% Savings Across 4 Vision Models
A few days ago I launched Token0 -- an open-source API proxy that makes vision LLM calls cheaper by optimizing images before they hit the model. The response was great, so here is the first real update: v0.2.0 with full streaming support and expanded benchmarks . What's New in v0.2.0 1. Streaming support ( stream=true ) This was the most requested feature. Token0 now supports Server-Sent Events streaming across all four providers -- OpenAI, Anthropic, Google, and Ollama. How it works: Token0 optimizes your images before streaming begins, then tokens flow word-by-word exactly like native provider APIs. You get the cost savings without sacrificing the real-time UX. from openai import OpenAI client = OpenAI ( base_url = " http://localhost:8000/v1 " , api_key = " sk-... " , ) stream = client . chat . completions . create ( model = " gpt-4o " , messages = [{ " role " : " user " , " content " : [ { " type " : " text " , " text " : " Describe this image " }, { " type " : " image_url " , " ima
Continue reading on Dev.to Webdev
Opens in a new tab
.jpg&w=1200&q=75)



