Tokens vs Bytes in AI: What LLMs Actually See When You Type

You type "你好 Hello" into GPT-5. That's 7 characters. But the model processes it as 2 tokens — and your bill is based on those tokens, not the characters. Meanwhile, your computer stores that same text as 12 bytes . So what's the difference between bytes, characters, and tokens? Why does AI use tokens instead of raw bytes? And why does the same sentence cost more in Chinese than in English? Start at the Bottom: What Is a Byte? A byte is the smallest unit of data your computer stores. One byte = 8 bits = a number from 0 to 255. When you save text to a file, your computer encodes each character into bytes using UTF-8 : Character UTF-8 Bytes Byte Count Hex H 72 1 48 e 101 1 65 你 228, 189, 160 3 e4 bd a0 好 229, 165, 189 3 e5 a5 bd 🚀 240, 159, 154, 128 4 f0 9f 9a 80 Key pattern: English letters : 1 byte each Chinese/Japanese/Korean : 3 bytes each Emojis : 4 bytes each The Four Levels: Bytes → Characters → Words → Tokens Level "Hello, World" Count Description Bytes 48 65 6c 6c 6f 2c 20 57 6f

Tokens vs Bytes in AI: What LLMs Actually See When You Type

Related Articles

UVWATAUAVAWH, The Pushy String

15 Years of Forking (Waterfox)

The Steam Controller D0ggle Adventure

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

telecheck and tyms past

Related Articles

News
UVWATAUAVAWH, The Pushy String
Lobsters • 4h ago

News
15 Years of Forking (Waterfox)
Lobsters • 5h ago

News
The Steam Controller D0ggle Adventure
Lobsters • 5h ago

News
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Dev.to • 9h ago

News
telecheck and tyms past
Lobsters • 10h ago