
Evaluating Claude's dbt Skills: Building an Eval from Scratch
I wanted to explore the extent to which Claude Code could build a data pipeline using dbt without iterative prompting. What difference did skills, models, and the prompt itself make? I've written in a separate post about what I found ( yes it's good; no it's not going to replace data engineers, yet ). In this post I'm going to show how I ran these tests (with Claude) and analysed the results (using Claude), including a pretty dashboard (created by Claude): The Test Can Claude Code build a production-ready dbt project? (is AI going to take data engineers\' jobs?) ::: title Terminology check I am not, as you can already tell, an expert at building and running this kind of controlled test. I've adopted my own terminology to refer to elements of what I was doing, which may or may not match what someone who knows what they're doing would use :) Scenario: What are we testing (specific Prompt + Skill combination) Configuration: Scenario + Model Run: Execution of a configuration Validation: De
Continue reading on Dev.to
Opens in a new tab

