
Tokens - the Language of AI
Hundreds of countries in the world, speaking thousands of languages - how can AI understand them all? Large Language Models (LLM's) - the brains of modern Artificial Intelligence (AI) - speak in numbers. You can think of these numbers as a compressed form of human language - each token represents one or more letters in some human language (there are also special tokens - we'll get to that later). For example, the sentence "Hello World!", might be translated to the tokens [1989, 31337]. However, LLM's are no different than people - there are multiple tokenization methods, meaning different models have different tokens to represent the same text, as depicted in the image above. Tokenization - the process of translating words into tokens - is the first step LLM's take to process their inputs. Before they can think and generate answers, they need to understand it in their own "language". To understand how tokenization works, we first need to understand its goals, which include: Structure:
Continue reading on Dev.to
Opens in a new tab



