TraceTally
TraceTally is a web app (with optional CLI) for LLM testing and governance that focuses on one thing: making model behavior auditable and repeatable across versions, providers, and prompts. Teams define “contracts” for key workflows (support replies, data extraction, policy compliance) and run them on a schedule or in CI. Each run stores the full trace: prompt template, tool calls, retrieved context hashes, model/version, parameters, and outputs—then signs the artifact so results can’t be quietly altered. It highlights regressions with diff views, risk scoring, and a clear pass/fail gate you can enforce before shipping. It’s an AI app plus traditional app: it uses LLMs for auto-generated test cases and semantic diffing, but the core value is governance-grade logging, provenance, and controls. It’s built for teams that need evidence, not vibes.