Back to articles
Tokens vs Bytes in AI: What LLMs Actually See When You Type

Tokens vs Bytes in AI: What LLMs Actually See When You Type

via Dev.to TutorialJenny Met

You type "你好 Hello" into GPT-5. That's 7 characters. But the model processes it as 2 tokens — and your bill is based on those tokens, not the characters. Meanwhile, your computer stores that same text as 12 bytes . So what's the difference between bytes, characters, and tokens? Why does AI use tokens instead of raw bytes? And why does the same sentence cost more in Chinese than in English? Start at the Bottom: What Is a Byte? A byte is the smallest unit of data your computer stores. One byte = 8 bits = a number from 0 to 255. When you save text to a file, your computer encodes each character into bytes using UTF-8 : Character UTF-8 Bytes Byte Count Hex H 72 1 48 e 101 1 65 你 228, 189, 160 3 e4 bd a0 好 229, 165, 189 3 e5 a5 bd 🚀 240, 159, 154, 128 4 f0 9f 9a 80 Key pattern: English letters : 1 byte each Chinese/Japanese/Korean : 3 bytes each Emojis : 4 bytes each The Four Levels: Bytes → Characters → Words → Tokens Level "Hello, World" Count Description Bytes 48 65 6c 6c 6f 2c 20 57 6f

Continue reading on Dev.to Tutorial

Opens in a new tab

Read Full Article
2 views

Related Articles