Pular para o conteúdo principal

Testing Agents

An agent that talks to your customers is production software. Test it like production software. Docana lets you write test cases for an agent, run them after every change, and see exactly which step broke.

Find it in your application: Agents, pick an agent, then open its Evals tab.

Create a Test Case

  1. Click Create Test Case
  2. Write the input: the message a user would send
  3. Add assertions: what must be true after the agent runs. Pick a node from your workflow, the field to check, and the condition it must meet
  4. Optionally add initial context (data the conversation starts with) and node mocks (canned responses for webhooks and tools, so tests don't hit real services)
A list of agent test cases, each passing or failing its assertions, with an auto-improve suggestion below
Test cases run against an agent. Each one passes or fails its assertions, and auto-improve suggests a fix.

Run Your Tests

Run a single test case or all of them. Each run shows pass or fail per assertion, with the full execution result behind it: what the agent understood, which branches it took, what it answered.

Run your tests after every meaningful change to the agent, the same way you'd run a test suite after changing code.

Generate Test Cases Automatically

Writing tests is the part everyone skips. So Docana writes them for you: click Auto-generate, choose how many test cases you want, and Docana reads your agent's workflow and proposes tests that cover its branches. Review them before saving. They're suggestions, not gospel.

Let Docana Suggest Fixes

When tests fail, Auto-improve analyzes the failures and suggests changes to your agent. Review each suggestion and apply the ones that make sense. It's a code review from someone who just read all your failing tests.

Debugging a Failure

The Logs tab keeps every execution, step by step. Open a run and you can trace the exact path: which evaluation fired, what it scored, which branch the agent took, and what each node produced. Most "the agent answered wrong" mysteries resolve in the first thirty seconds of reading the trace.

Best Practices

  1. Test the unhappy paths: angry users, missing information, questions outside the agent's scope. The happy path was never the problem.
  2. Mock external calls: use node mocks for webhooks and emails so tests are fast and don't spam real systems.
  3. Make assertions specific: "response mentions the refund deadline" catches more bugs than "response is not empty".
  4. Run before you publish: a failing test case caught in the editor costs nothing. The same failure in production costs a customer.

Next Steps