Back to articles
We Let an LLM Control a File System and Run Commands – Here’s What Actually Broke First

We Let an LLM Control a File System and Run Commands – Here’s What Actually Broke First

via Dev.tojacobjerryarackal

I wanted to push an LLM beyond simple chat and see if it could actually build real code. So I gave it direct access to the file system and the ability to run terminal commands. The task was straightforward: “Create a clean React login page with email, password, remember-me checkbox, and form validation.” It started confidently. Within minutes everything broke. The System We Built We connected two tools to the LLM: file_system (list, read, write, delete files) run_command (execute npm, start dev server, etc.) We used MCP (the “USB-C for AI” protocol) so the model could call tools cleanly. The goal was to let the LLM act like a real developer — explore the folder, create files, install packages, and test the app. It sounded simple. It was not. Failure #1: It Assumed the Project Already Existed What broke: The model immediately started writing Login.jsx in an empty folder. No package.json, no React setup, no dependencies. Why it broke: The LLM had no understanding of project bootstrapping

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles