
How I stopped bad data from reaching my warehouse using a single Airflow task
Same pattern, third time in six months. DAG runs green. Airflow shows all tasks succeeded. I wake up to a Slack message saying the revenue dashboard is broken. I dig through logs, trace it back to an extract job from four hours ago — a field that was always numeric now has strings mixed in. 48,000 rows loaded. Every downstream model is wrong. The fix was obvious in hindsight: the quality check needs to happen before the INSERT , not after it. dbt tests, Great Expectations, warehouse constraints — they're all good tools, but they validate data that's already been written. By the time they flag an issue, the damage is done. So I built a quality gate task that sits between extract and load. Here's exactly what I did. The pattern Before: extract → load → warehouse ✓ [ bad rows sitting in production ] [ dashboard broken at 9am ] After: extract → screen → PASS → load → warehouse ✓ → WARN → load + flag for review → BLOCK → dead-letter queue, pipeline stops One task. No custom validation logic
Continue reading on Dev.to
Opens in a new tab