
Building a Real-Time Security Camera System with Local Vision LLMs
I replaced my Lorex NVR's motion detection — which alerted me 40 times a day about swaying trees and shadows — with a pipeline that uses a vision language model to understand what it's actually seeing. It runs entirely on local hardware, costs nothing after setup, and sends me a WhatsApp message only when something real happens. Architecture 3× Lorex 4K cameras (RTSP) ↓ gate_monitor.py (Mac Studio, M2 Ultra) ├── OpenCV: frame capture every 5s per camera ├── OpenCV: contour-based motion detection (frame N vs N-1) ├── Crop: extract largest changed region ├── VLM: qwen2.5vl:7b on DGX Spark (Blackwell, 10GbE link) │ └── "Classify this crop: ALERT or CLEAR?" ├── Alert: annotate frame with contour boxes │ ├── WiiM speaker announcement (TTS) │ └── WhatsApp message with image └── Audio: faster-whisper transcription (gate camera only) └── Gated by visual confirmation (120s window) Three cameras — front gate, backyard, driveway — each running in parallel threads. The system processes about 50,00
Continue reading on Dev.to
Opens in a new tab




