RLReplayLab

RLReplayLab is a web app for teams deploying reinforcement learning (or bandit) policies where online experimentation is risky or expensive. It ingests logged interaction data (states, actions, propensities, rewards, constraints) and runs offline policy evaluation (OPE) to estimate how a new policy would perform before you ship it. The product focuses on practical debugging: it flags logging-policy mismatch, propensity issues, reward leakage, and distribution shift, then shows where the candidate policy wins/loses via slice analysis. It also generates counterfactual “replay” scenarios so product and ML stakeholders can understand why a policy changes outcomes. This is not a magic “train RL from logs” button; it’s a decision-support tool that reduces costly mistakes and helps teams decide when to run a limited online test. It fits recommendation, pricing, ads, and operations workflows.

← Back to idea list