
I Ran 60+ Automated Tests on My AI Skills Registry — Here's What Broke
The setup I've been building an open registry that indexes AI agent skills — think npm but for agent capabilities. The idea: crawl GitHub repos, extract skill metadata, and let agents discover tools they need at runtime. After indexing 5,090 skills from 200+ repositories , I figured it was time to actually test whether any of this worked. I wrote 60+ automated tests covering the API surface, search quality, security headers, and data integrity. The results were... humbling. Auto-tagging was wrong 50% of the time This was the biggest gut punch. I had an auto-tagger that analyzed skill descriptions and assigned category tags. Seemed smart. Seemed useful. It tagged a PostgreSQL migration skill as robotics . A bioinformatics pipeline skill got iOS . A Redis caching skill got embedded-systems . 50% of auto-assigned tags were wrong. Not slightly-off wrong — completely unrelated domain wrong. The root cause was pretty mundane: the tagger was matching on incidental keywords in descriptions rat
Continue reading on Dev.to Webdev
Opens in a new tab




