
An Analogy to Help Understand Mixture of Experts
If you're having a hard time understanding MoE strength vs dense models, and roughly where they might land when comparing them, think about this super oversimplified analogy. I'm hoping it makes sense: The Scenario Imagine a paid trivia competition, but all the questions are about carpentry regulations: you're given a piece of paper, you fill out the paper and then hand it in. There are two "teams" competing with each other, except one team just has a single dude on it. Both teams need a place to sit in the building while the competition is going on. Team 1 (10b Dense Model) Team 1 is just some fairly experienced carpenter with 10 years of experience. He gets the paper, works through every question himself, and turns it in. He really likes his personal space, so he reserved 10 seats all to himself. Total experience on the team: 10 years Experience applied to each question: 10 years Total Seats Needed: 10 seats Team 2 (40b a10b MoE Model) Team 2 is a large crew of 40 first-year apprenti
Continue reading on Dev.to
Opens in a new tab

