Back to articles
DuckDB Has a Free In-Process Analytics Engine — Run SQL on CSV Parquet and JSON Without a Server

DuckDB Has a Free In-Process Analytics Engine — Run SQL on CSV Parquet and JSON Without a Server

via Dev.to PythonAlex Spinov

DuckDB Runs SQL on CSV and Parquet Without a Server You have a 5GB CSV file. Pandas loads it all into memory and crashes. DuckDB queries it with SQL — streaming, fast, using barely any RAM. What Makes DuckDB Special In-process — runs inside your Python/Node/R script No server — zero setup, zero dependencies Columnar engine — vectorized execution for fast analytics Direct file queries — SQL on CSV, Parquet, JSON, Excel PostgreSQL compatible — familiar SQL dialect Extensions — httpfs, spatial, iceberg, delta Quick Start import duckdb result = duckdb . sql ( """ SELECT city, COUNT(*) as orders, SUM(amount) as revenue FROM ' orders.csv ' GROUP BY city ORDER BY revenue DESC LIMIT 10 """ ). fetchdf () # Query Parquet on S3 duckdb . sql ( " SELECT * FROM read_parquet( ' s3://bucket/data/*.parquet ' ) " ) # Query JSON duckdb . sql ( " SELECT * FROM read_json_auto( ' events.json ' ) " ) DuckDB vs Pandas Task DuckDB Pandas 5GB CSV aggregation 3 sec OOM crash Memory usage Streaming Full load Synt

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
0 views

Related Articles