Back to articles
I built pq - the jq of Parquet. Here's why data engineers need a better CLI
How-ToTools

I built pq - the jq of Parquet. Here's why data engineers need a better CLI

via Dev.toEvgenii Orlov

I got tired of spinning up DuckDB or writing throwaway Python just to peek inside a Parquet file. So I built pq - a single binary CLI (Rust) that handles the full Parquet workflow from your terminal Quick taste: pq data.parquet — metadata, schema, compression, row groups at a glance pq head -n 5 -c id,name s3://bucket/data.parquet — preview specific columns directly from S3 pq schema extract --ddl postgres data.parquet — generate CREATE TABLE (supports Postgres, ClickHouse, DuckDB, Spark, BigQuery, Snowflake, Redshift, MySQL) pq check --contract contract.toml data/ — validate file structure and data contracts in CI pq schema diff a.parquet b.parquet — catch schema drift between files pq compact data/ -o s3://bucket/compacted/ — merge small files into optimal sizes pq convert raw/*.csv -o parquet/ — batch convert CSV/JSON to Parquet It auto-detects output format (table on TTY, JSON when piped), supports glob patterns, and works with S3, GCS, Azure Blob, and Cloudflare R2. Install: brew

Continue reading on Dev.to

Opens in a new tab

Read Full Article
7 views

Related Articles