TraceTriage
TraceTriage is a web app for LLM testing and evaluation that focuses on root-cause analysis instead of just scores. It captures full request/response traces (prompt, retrieved context, tool calls, model params, latency, cost) and automatically clusters failures into actionable buckets like “retrieval mismatch,” “tool misuse,” “format drift,” or “policy over-blocking.” It then generates minimal repro cases and suggests targeted regression tests you can add to your suite. The product is designed for teams shipping LLM features weekly who are drowning in flaky evals and vague “quality dropped” alerts. It integrates with common LLM gateways and observability pipelines, so you can link a production incident to the exact eval cases that would have caught it. This is not a magical quality oracle; it’s a practical debugging workflow that reduces time-to-fix and prevents repeat failures.