
How I Crashed My AI Agent Fleet in 30 Minutes (And Fixed It): VRAM Management on Apple Silicon
How I Crashed My AI Agent Fleet in 30 Minutes (And Fixed It): VRAM Management on Apple Silicon I learned this the hard way at 5 AM on a Thursday. I'm running an autonomous AI agent system on a MacBook Pro M3 Pro with 36GB unified memory. The setup: multiple local LLMs via Ollama, orchestrated by a main agent that delegates tasks to subagents running on different models. Think of it as a small company where the CEO (main agent) assigns work to specialists (local models). It was working beautifully. Then it wasn't. The Crash My warmup routine loaded four models simultaneously every 4 minutes to keep them "hot" in memory: Model VRAM Context Window mistral:7b 4.4 GB 32k qwen3:8b 5.2 GB 40k llama3.1:8b 4.9 GB 128k qwen2.5-coder:14b 9.0 GB 128k Total 23.5 GB — That left ~12.5GB for everything else: macOS, the orchestrator, and any mission cron jobs that needed to spawn additional model instances. Here's where it went wrong. My cron jobs — automated tasks running every few hours — would try t
Continue reading on Dev.to Tutorial
Opens in a new tab




