Transformer from scratch

Character-level GPT transformer built in PyTorch from scratch — pure architecture and training from zero. No fine-tuning, no pre-trained weights, no cloud compute. Git hub repo : https://github.com/Eamon2009/Transformer-language-model Can be trained on $300 machine What I trained: Parameters : 0.82M Dataset : 201K characters of children's stories Vocab size : 28 unique characters Hardware : CPU only — AMD Ryzen 5 Train time : 39 minutes Best val : 1.3145 — still improving at step 3000 Full training log: [ 0/3000] train=3.2961 val=3.2981 << best! [ 200/3000] train=2.3038 val=2.2490 << best! [ 400/3000] train=2.2469 val=2.1950 << best! [ 800/3000] train=1.9742 val=1.9103 << best! [ 1400/3000] train=1.5889 val=1.5360 << best! [ 2000/3000] train=1.4604 val=1.4081 << best! [ 2600/3000] train=1.3501 val=1.3446 << best! [ 2999/3000] train=1.3191 val=1.3145 << best! Every single checkpoint improved. No overfitting at all — train and val loss decreased together the entire run. Actual output the

Transformer from scratch

Related Articles

X tries to limit creator revenue for foreign influencers but Musk intervenes

Best Vacuum Deals for Amazon's Spring Sale: Dyson, Shark, Bissell (2026)

Your First Parser

This Anker portable power station will keep your devices running during an outage - and it's 46% off

Amazon’s Spring Sale Is Blooming With Smart Bird Feeders

Related Articles

News
X tries to limit creator revenue for foreign influencers but Musk intervenes
The Verge • 3h ago

News
Best Vacuum Deals for Amazon's Spring Sale: Dyson, Shark, Bissell (2026)
Wired • 3h ago

News
Your First Parser
Lobsters • 3h ago

News
This Anker portable power station will keep your devices running during an outage - and it's 46% off
ZDNet • 3h ago

News
Amazon’s Spring Sale Is Blooming With Smart Bird Feeders
Wired • 4h ago