
The Exact LLM Monitoring Stack I Run in Production (2026)
After 18 months of running LLM applications in production, here is the monitoring stack I actually use. Not theoretical. Not a demo. Real infrastructure. The Stack 1. Drift Detection: DriftWatch My own tool, but I use it daily. Runs weekly baseline comparisons, alerts when drift exceeds 0.2. 2. Cost Tracking: Built-in logging Every LLM call logged with: Model Token count (prompt + completion) Cost per call User/session ID Feature name Simple SQL table. Query in Grafana. 3. Latency: Prometheus + Grafana Track p50, p95, p99 latency per endpoint. Alert on p95 > 5s. 4. Error Tracking: Sentry LLM API errors, timeout errors, parse errors. All tracked. 5. Output Quality: Custom checks JSON validation. Schema checks. Length validation. Flag anything that fails. The Alerting Rules Metric Warning Critical Drift score > 0.15 > 0.30 Latency p95 > 3s > 5s Error rate > 1% > 5% Cost/day > £50 > £100 What I Alert On Not everything. Only things that require human action: Drift detected — review new out
Continue reading on Dev.to DevOps
Opens in a new tab

