Evaluating Claude's dbt Skills: Building an Eval from Scratch

I wanted to explore the extent to which Claude Code could build a data pipeline using dbt without iterative prompting. What difference did skills, models, and the prompt itself make? I've written in a separate post about what I found ( yes it's good; no it's not going to replace data engineers, yet ). In this post I'm going to show how I ran these tests (with Claude) and analysed the results (using Claude), including a pretty dashboard (created by Claude): The Test Can Claude Code build a production-ready dbt project? (is AI going to take data engineers\' jobs?) ::: title Terminology check I am not, as you can already tell, an expert at building and running this kind of controlled test. I've adopted my own terminology to refer to elements of what I was doing, which may or may not match what someone who knows what they're doing would use :) Scenario: What are we testing (specific Prompt + Skill combination) Configuration: Scenario + Model Run: Execution of a configuration Validation: De

Evaluating Claude's dbt Skills: Building an Eval from Scratch

Related Articles

The Hidden Magic (and Monsters) of Go Strings: Zero-Copy Slicing & Builder Secrets

Why Watching Tutorials Won’t Make You a Good Programmer

The Code That Makes Rockets Fly

Spotify tests letting users directly customize their Taste Profile

How to Add Face Search to Your App

Related Articles

How-To
The Hidden Magic (and Monsters) of Go Strings: Zero-Copy Slicing & Builder Secrets
Medium Programming • 43m ago

How-To
Why Watching Tutorials Won’t Make You a Good Programmer
Medium Programming • 3h ago

How-To
The Code That Makes Rockets Fly
Medium Programming • 4h ago

How-To
Spotify tests letting users directly customize their Taste Profile
The Verge • 5h ago

How-To
How to Add Face Search to Your App
Dev.to Tutorial • 5h ago