Agent readiness from playbooks
Turn business playbooks into agent test suites.
Paste one prompt into your coding agent to install Wendell, create a repo skill, generate suites from playbooks, and run agent tests before production.
Start with CLI
Install Wendell in one command.
curl -fsSL https://www.wendellai.com/install | bashGives your agent one prompt to install the CLI and create a repo skill
Keeps credentials out of agent instructions and local skill files
Turns playbooks, SOPs, tool contracts, and tickets into executable suites
Runs locally, in CI, or through hosted workflows before production
Generated suite
See exactly what the suite will test.
Refund agent
Refund escalation playbook
24
scenarios
8
policies
5
critical gates
Playbook
Policies, SOPs, tool docs, real tickets, known failures, and expert review.
Test suite
Generated agent test cases with state, tools, scenarios, scoring, and traces.
Readiness
Run your agent, catch regressions, and see the evidence behind every failure.
Upload a playbook. Generate a test suite. Run your agent. See what breaks.
Why a skill
Agents need a repeatable operating manual.
Installable
The skill gives agents a known path for installing and verifying the Wendell CLI.
Repo-aware
It can point agents to playbooks, configs, scripts, and docs that already exist.
Repeatable
Every agent gets the same steps for generating, running, and inspecting suites.
Publishable
Once dogfooded, the same skill can become a public onboarding artifact.
FAQ
Where Wendell fits.
Wendell does not replace your agent stack. It gives agents a repeatable way to install the CLI, build suites from playbooks, and test the systems you are already shipping.
How is this different from an eval platform?+
Eval platforms help teams manage prompts, datasets, traces, and scores. Wendell focuses on where the test cases come from: your business playbooks. It turns policies, SOPs, tool docs, and real examples into workflow-specific test suites.
How is this different from observability?+
Observability tells you what happened after an agent ran. Wendell helps you create repeatable test suites before production, so you can test whether the agent follows the workflow before real users or systems are affected.
How is this different from an agent builder?+
Agent builders help you construct the agent. Wendell tests the agent you already built. It can point to the prompt, tool, or policy behavior that needs improvement, but it does not require you to rebuild your stack inside Wendell.
Is this just an LLM judge?+
No. Wendell should prefer objective checks where they exist: final state, required evidence, tool calls, forbidden actions, policy gates, and critical failures. LLM judgment is useful for subjective dimensions like tone or explanation quality, not as the only source of truth.
Why not just hand-write eval cases?+
Manual evals are useful, but they become brittle and incomplete as workflows change. Wendell starts from the playbook and keeps the suite tied to workflow rules, risky branches, tool behavior, and known failure cases.
What do you need from us?+
A useful first suite can start with a support playbook, SOP, tool contract, policy doc, or set of representative tickets. The strongest suites combine approved policy, real examples, tool schemas, and known regressions.