PromptDiff
PromptDiff is a web app for teams building LLM features who need reliable, repeatable prompt regression testing. Developers store prompts, system instructions, variables, and “golden” expected responses (or scoring rules) for a baseline model. When switching providers or upgrading models, they rerun the same test suite against a new model and instantly see diffs: semantic changes, missing facts, policy refusals, formatting drift, and tool-call schema breaks. The app supports lightweight evaluation modes: exact match for structured outputs, JSON schema validation, embedding-based similarity, and LLM-as-judge scoring with configurable rubrics. Results are tracked over time with run history, pass/fail thresholds, and a simple CI integration so model changes don’t silently degrade production behavior. It’s built for web-first workflows with an API to trigger runs from pipelines and store artifacts for auditability.