DriftReplay

DriftReplay is a web app (with optional CLI) that records lightweight “execution replays” for data pipelines so teams can reproduce failures exactly—without begging for access to production systems or guessing which upstream state changed. It captures metadata, configs, query plans, schema snapshots, sample rows (hashed/redacted), and dependency versions at each run, then lets you spin up a deterministic replay in an isolated environment (Docker/Kubernetes) to debug. The product focuses on the painful middle: pipelines that mostly work but occasionally break due to upstream changes, late-arriving data, or subtle type/NULL shifts. It integrates with common orchestrators (Airflow, Dagster, Prefect) and warehouses (Snowflake, BigQuery, Databricks) via connectors. Expect constraints: you won’t capture everything (e.g., external APIs), but you’ll drastically cut “can’t reproduce” time and reduce mean time to recovery.

← Back to idea list