Tokens - the Language of AI

Hundreds of countries in the world, speaking thousands of languages - how can AI understand them all? Large Language Models (LLM's) - the brains of modern Artificial Intelligence (AI) - speak in numbers. You can think of these numbers as a compressed form of human language - each token represents one or more letters in some human language (there are also special tokens - we'll get to that later). For example, the sentence "Hello World!", might be translated to the tokens [1989, 31337]. However, LLM's are no different than people - there are multiple tokenization methods, meaning different models have different tokens to represent the same text, as depicted in the image above. Tokenization - the process of translating words into tokens - is the first step LLM's take to process their inputs. Before they can think and generate answers, they need to understand it in their own "language". To understand how tokenization works, we first need to understand its goals, which include: Structure:

Tokens - the Language of AI

Related Articles

Paramount+ just dropped to $2.99 a month - here's how to sign up

70+ Free Online Tools That Make Everyday Tasks Easier

I Tried to Build My First iOS Product — This Is What Happened

This unassuming amplifier is the one audio upgrade that finally made my speakers sing

Gas Surgery: Reducing Merkle Mixer Costs by 25% on Base

Related Articles

How-To
Paramount+ just dropped to $2.99 a month - here's how to sign up
ZDNet • 3h ago

How-To
70+ Free Online Tools That Make Everyday Tasks Easier
Medium Programming • 3h ago

How-To
I Tried to Build My First iOS Product — This Is What Happened
Medium Programming • 4h ago

How-To
This unassuming amplifier is the one audio upgrade that finally made my speakers sing
ZDNet • 5h ago

How-To
Gas Surgery: Reducing Merkle Mixer Costs by 25% on Base
Medium Programming • 6h ago