TSGuard
TSGuard is a web app (with optional CLI agent) that continuously audits your time-series database and ingestion pipeline for the problems that actually cause outages: cardinality explosions, runaway label sets, retention misconfigurations, hot partitions, slow queries, and silent ingestion drops. It connects to common TSDBs (Prometheus/Thanos, InfluxDB, TimescaleDB) and builds a baseline of normal behavior, then alerts on meaningful deviations with concrete remediation steps (e.g., “top 20 labels causing 78% of series growth; drop/relable these metrics”). It also provides a “cost-to-metric” view so teams can delete low-value metrics and reduce storage/compute. This is a combination traditional + AI app: traditional collectors and rules for hard failures, plus AI to summarize incidents, propose fixes, and generate PR-ready config changes (recording rules, relabeling, retention policies).