Soda Data Quality
Test data quality with Soda Core and SodaCL. Generates YAML check definitions, anomaly detection rules, dbt integrations, and CI scans for freshness, schema, distribution, and reconciliation.
This skill writes SodaCL checks for freshness, missing values, schema evolution, row-count anomalies, distribution drift, and cross-table reconciliation. Configures soda-core with warehouse adapters (Snowflake, BigQuery, Redshift, Postgres, Databricks), embeds scans in Airflow/dbt, and wires Soda Cloud for monitoring. Covers programmatic API for embedding scans in Python apps.
When to use
Use when adding data quality monitoring to a warehouse, replacing brittle SQL-based tests, catching upstream schema drift, or building reconciliation checks between systems.
Examples
Freshness and anomaly checks
Catch stale data and unusual row counts
Write Soda checks for our events table: freshness < 1 hour on event_time, anomaly score on daily row count, schema must contain user_id and session_id, no nulls in user_id
Reconciliation across systems
Compare staging and prod warehouse totals
Build a Soda reconciliation check that compares daily revenue totals between our Postgres source and Snowflake reporting layer, alerting if they diverge by more than 0.5%