Show HN: PhAIL – Real-robot benchmark for AI models

I built this because I couldn't find honest numbers on how well VLA models [1] actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know. PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running. Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+. Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions. Happy to answer questions about methodology, the models, or what we observed. [1] Vision-Language-Action: https://en.wikipedia.org/wiki/Vision-language-action_model Comments URL: https://news.ycombinator.com/item?id=47589797 Points: 7 # Comments: 7

Show HN: PhAIL – Real-robot benchmark for AI models

Related Articles

Pidgin 3.0 Alpha 1 2.95.0 has been released

Write Once, Run Anywhere (For Real This Time)

Anker’s power bank with built-in cables is one of my favorite gadgets, and it’s cheaper than usual

Meta was finally held accountable for harming teens. Now what?

Every Senior Engineer I Respect Has Read These Books (Have You?)

Related Articles

News
Pidgin 3.0 Alpha 1 2.95.0 has been released
Lobsters • 37m ago

News
Write Once, Run Anywhere (For Real This Time)
Medium Programming • 40m ago

News
Anker’s power bank with built-in cables is one of my favorite gadgets, and it’s cheaper than usual
The Verge • 1h ago

News
Meta was finally held accountable for harming teens. Now what?
TechCrunch • 1h ago

News
Every Senior Engineer I Respect Has Read These Books (Have You?)
Medium Programming • 1h ago

Show HN: PhAIL – Real-robot benchmark for AI models

Related Articles

Pidgin 3.0 Alpha 1 2.95.0 has been released

Write Once, Run Anywhere (For Real This Time)

Anker&#8217;s power bank with built-in cables is one of my favorite gadgets, and it&#8217;s cheaper than usual

Meta was finally held accountable for harming teens. Now what?

Every Senior Engineer I Respect Has Read These Books (Have You?)

Anker’s power bank with built-in cables is one of my favorite gadgets, and it’s cheaper than usual