Back to articles
DuckDB Changed How I Process CSV Files — 7 Queries That Replace pandas
How-ToSystems

DuckDB Changed How I Process CSV Files — 7 Queries That Replace pandas

via Dev.to TutorialAlex Spinov

DuckDB just changed how I process CSV files. No more pandas for simple analysis. No more importing into PostgreSQL. You just query the file directly. The Basics -- Query a CSV file. No import, no schema definition. SELECT * FROM read_csv_auto ( 'sales.csv' ) LIMIT 5 ; DuckDB auto-detects column types, handles headers, and deals with messy data. It takes 0.3 seconds for a 1M row file. pandas takes 4 seconds for the same file. Install pip install duckdb Or use the CLI: brew install duckdb # macOS # Then just run: duckdb 7 Queries That Replace pandas 1. Basic Aggregation -- pandas: df.groupby('country')['revenue'].sum().sort_values(ascending=False) SELECT country , SUM ( revenue ) as total FROM read_csv_auto ( 'sales.csv' ) GROUP BY country ORDER BY total DESC ; 2. Filter and Transform -- pandas: df[df['status'] == 'active'].assign(tax=df['price'] * 0.2) SELECT * , price * 0 . 2 as tax FROM read_csv_auto ( 'products.csv' ) WHERE status = 'active' ; 3. Join Two CSVs -- pandas: pd.merge(ord

Continue reading on Dev.to Tutorial

Opens in a new tab

Read Full Article
8 views

Related Articles