Back to articles
Anthropic Never Released Their Tokenizer. Here's What We Found Testing the Alternatives

Anthropic Never Released Their Tokenizer. Here's What We Found Testing the Alternatives

via Dev.to JavaScriptJ Schoemaker

bpe-lite accuracy benchmark — report Date: 2026-03-19 Model tested against: claude-haiku-4-5-20251001 via Anthropic count_tokens API Tokenizers compared: bpe-lite (modified Xenova), ai-tokenizer (claude encoding), raw Xenova (unmodified) 1. Background bpe-lite is a zero-dependency JS tokenizer supporting OpenAI (cl100k / o200k), Anthropic (Xenova/claude-tokenizer, 65k BPE), and Gemini (Gemma3 SPM). Anthropic has not released the Claude 4 tokenizer, so the Anthropic provider is a reverse-engineered approximation sourced from Xenova/claude-tokenizer on HuggingFace, with hand-tuned modifications. This report documents the construction of a stratified accuracy benchmark and its results. 2. Benchmark corpus Design 120 samples across 12 categories (10 per category): Category Focus english-prose sentences, paragraphs, mixed punctuation, dialogue code-python functions, classes, decorators, f-strings, async code-js arrow functions, classes, JSX, TypeScript, async/await numbers integers, floats,

Continue reading on Dev.to JavaScript

Opens in a new tab

Read Full Article
2 views

Related Articles