DriftGuard

DriftGuard is a web app (with optional CLI) that sits between your data pipeline and training jobs to detect dataset drift, label leakage, and silent schema changes that commonly ruin model training runs. It profiles every incoming training dataset snapshot, compares it to the last “known-good” version, and blocks or flags risky changes with clear, actionable reports (not vague metrics). It also generates a lightweight “training readiness” checklist: missing values spikes, target distribution shifts, new categories, feature correlation flips, and train/validation contamination signals. The goal is to reduce wasted GPU spend and engineering time by preventing bad retrains from starting. This is an AI app + traditional app combination: traditional rules/statistics for reliable checks, plus AI-generated explanations and suggested fixes (e.g., which feature engineering step likely caused the shift). It integrates with S3/GCS, Snowflake/BigQuery, and MLflow/Kubeflow via simple connectors.

← Back to idea list