
How-ToMachine Learning
How Microsoft Trained a 270M-Pair AI to Power Smarter Search
via HackernoonMicrosoft
Researchers at Microsoft Corporation introduce E5, a powerful text embedding model trained on 270M curated web text pairs (CCPairs). Using contrastive learning with in-batch negatives, E5 becomes the first unsupervised model to outperform BM25 on the BEIR benchmark. After fine-tuning, it tops the MTEB leaderboard—beating models 40× larger in retrieval, clustering, classification, and semantic similarity tasks.
Continue reading on Hackernoon
Opens in a new tab
18 views

