Agent readiness from playbooks

Turn business playbooks into agent test suites.

Paste one prompt into your coding agent to install Wendell, create a repo skill, generate suites from playbooks, and run agent tests before production.

Start with CLI

Install Wendell in one command.

curl -fsSL https://www.wendellai.com/install | bash
Quickstart

Gives your agent one prompt to install the CLI and create a repo skill

Keeps credentials out of agent instructions and local skill files

Turns playbooks, SOPs, tool contracts, and tickets into executable suites

Runs locally, in CI, or through hosted workflows before production

Generated suite

See exactly what the suite will test.

Refund agent

Refund escalation playbook

ready

24

scenarios

8

policies

5

critical gates

Playbook

Policies, SOPs, tool docs, real tickets, known failures, and expert review.

Test suite

Generated agent test cases with state, tools, scenarios, scoring, and traces.

Readiness

Run your agent, catch regressions, and see the evidence behind every failure.

Upload a playbook. Generate a test suite. Run your agent. See what breaks.

Why a skill

Agents need a repeatable operating manual.

Installable

The skill gives agents a known path for installing and verifying the Wendell CLI.

Repo-aware

It can point agents to playbooks, configs, scripts, and docs that already exist.

Repeatable

Every agent gets the same steps for generating, running, and inspecting suites.

Publishable

Once dogfooded, the same skill can become a public onboarding artifact.

FAQ

Where Wendell fits.

Wendell does not replace your agent stack. It gives agents a repeatable way to install the CLI, build suites from playbooks, and test the systems you are already shipping.

How is this different from an eval platform?+

Eval platforms help teams manage prompts, datasets, traces, and scores. Wendell focuses on where the test cases come from: your business playbooks. It turns policies, SOPs, tool docs, and real examples into workflow-specific test suites.

How is this different from observability?+

Observability tells you what happened after an agent ran. Wendell helps you create repeatable test suites before production, so you can test whether the agent follows the workflow before real users or systems are affected.

How is this different from an agent builder?+

Agent builders help you construct the agent. Wendell tests the agent you already built. It can point to the prompt, tool, or policy behavior that needs improvement, but it does not require you to rebuild your stack inside Wendell.

Is this just an LLM judge?+

No. Wendell should prefer objective checks where they exist: final state, required evidence, tool calls, forbidden actions, policy gates, and critical failures. LLM judgment is useful for subjective dimensions like tone or explanation quality, not as the only source of truth.

Why not just hand-write eval cases?+

Manual evals are useful, but they become brittle and incomplete as workflows change. Wendell starts from the playbook and keeps the suite tied to workflow rules, risky branches, tool behavior, and known failure cases.

What do you need from us?+

A useful first suite can start with a support playbook, SOP, tool contract, policy doc, or set of representative tickets. The strongest suites combine approved policy, real examples, tool schemas, and known regressions.