📊 Data ⚙️ Engineering Awaiting Security Review

DuckDB Analytics

Run fast analytical queries on Parquet, CSV, and JSON with DuckDB. Generates in-process pipelines, S3/HTTPFS reads, MotherDuck deploys, and pandas/polars integration for local-first analytics.

DuckDB is a columnar in-process database — like SQLite for analytics. This skill writes pipelines that read Parquet/CSV directly from S3, joins them in memory, exports results, and migrates queries between DuckDB local and MotherDuck cloud.

duckdb analytics olap parquet sql

When to use

Use for ad-hoc analytics on flat files, replacing pandas for big-but-not-huge data, building local-first dashboards, or running CI data quality checks without provisioning a warehouse.

Examples

Query Parquet directly from S3

Skip the warehouse for one-off analysis

Write a DuckDB query that joins three Parquet files on S3 via httpfs and exports the result to CSV without loading into memory

Replace pandas in an ETL job

Speed up a slow pandas pipeline

Rewrite this pandas join+aggregation pipeline as DuckDB SQL over the same CSVs — it's running out of memory at scale