
LLM Context Windows: Managing Tokens in Production AI Apps
The Token Budget Problem Claude claude-sonnet-4-6 has a 200k token context window. GPT-4o has 128k. These sound enormous until you're building a RAG application that needs to pass document context, conversation history, system prompts, and tool definitions simultaneously. Running out of context window mid-conversation is an unrecoverable failure. Managing it is an engineering discipline. Counting Tokens import Anthropic from ' @anthropic-ai/sdk ' ; import { encoding_for_model } from ' tiktoken ' ; // for OpenAI // Anthropic: use the API's token counting endpoint const anthropic = new Anthropic (); async function countTokens ( messages : Anthropic . MessageParam []) { const response = await anthropic . messages . countTokens ({ model : ' claude-sonnet-4-6 ' , messages , system : ' You are a helpful assistant. ' , }); return response . input_tokens ; } // OpenAI: use tiktoken locally (no API call needed) function countOpenAITokens ( text : string , model = ' gpt-4o ' ): number { const en
Continue reading on Dev.to
Opens in a new tab



