LeakLens

LeakLens is a web app (with a lightweight Python package) that audits ML datasets and training pipelines for silent data leakage and evaluation mistakes. You connect a Git repo or upload notebooks, plus a sample of train/validation/test data. It runs a battery of checks: target leakage indicators, time-based split violations, duplicate/near-duplicate leakage, feature leakage via post-outcome fields, label encoding bleed, and train-test contamination through preprocessing. It produces a plain-English report with severity, reproducible evidence (rows/columns and code locations), and suggested fixes. It also generates a “leakage risk score” you can track over time in CI, so teams stop shipping models that look great offline and fail in production. This is not a full MLOps platform; it’s a focused guardrail that fits into existing workflows and catches costly mistakes early.

← Back to idea list