The Exact LLM Monitoring Stack I Run in Production (2026)

After 18 months of running LLM applications in production, here is the monitoring stack I actually use. Not theoretical. Not a demo. Real infrastructure. The Stack 1. Drift Detection: DriftWatch My own tool, but I use it daily. Runs weekly baseline comparisons, alerts when drift exceeds 0.2. 2. Cost Tracking: Built-in logging Every LLM call logged with: Model Token count (prompt + completion) Cost per call User/session ID Feature name Simple SQL table. Query in Grafana. 3. Latency: Prometheus + Grafana Track p50, p95, p99 latency per endpoint. Alert on p95 > 5s. 4. Error Tracking: Sentry LLM API errors, timeout errors, parse errors. All tracked. 5. Output Quality: Custom checks JSON validation. Schema checks. Length validation. Flag anything that fails. The Alerting Rules Metric Warning Critical Drift score > 0.15 > 0.30 Latency p95 > 3s > 5s Error rate > 1% > 5% Cost/day > £50 > £100 What I Alert On Not everything. Only things that require human action: Drift detected — review new out

The Exact LLM Monitoring Stack I Run in Production (2026)

Related Articles

THE ULTIMATE MONEY MACHINE

I Got This DP Problem Wrong — Here’s Why

PDP’s wireless guitar controller has returned to its best price to date

From error-handling to structured concurrency

Polymarket's Coming-Out Party in Washington Was a Disaster

Related Articles

News
THE ULTIMATE MONEY MACHINE
Medium Programming • 28m ago

News
I Got This DP Problem Wrong — Here’s Why
Medium Programming • 30m ago

News
PDP’s wireless guitar controller has returned to its best price to date
The Verge • 33m ago

News
From error-handling to structured concurrency
Lobsters • 39m ago

News
Polymarket's Coming-Out Party in Washington Was a Disaster
Wired • 50m ago