
We Let an LLM Control a File System and Run Commands – Here’s What Actually Broke First
I wanted to push an LLM beyond simple chat and see if it could actually build real code. So I gave it direct access to the file system and the ability to run terminal commands. The task was straightforward: “Create a clean React login page with email, password, remember-me checkbox, and form validation.” It started confidently. Within minutes everything broke. The System We Built We connected two tools to the LLM: file_system (list, read, write, delete files) run_command (execute npm, start dev server, etc.) We used MCP (the “USB-C for AI” protocol) so the model could call tools cleanly. The goal was to let the LLM act like a real developer — explore the folder, create files, install packages, and test the app. It sounded simple. It was not. Failure #1: It Assumed the Project Already Existed What broke: The model immediately started writing Login.jsx in an empty folder. No package.json, no React setup, no dependencies. Why it broke: The LLM had no understanding of project bootstrapping
Continue reading on Dev.to
Opens in a new tab

.png&w=1200&q=75)