Test & Evaluate Agents
Write tests to validate agent behavior, benchmark different models, and enforce quality standards.
Vibe Check (@dao/vibe-check
) is a dual-purpose TypeScript testing framework that enables you to:
Built on Vitest v3, vibe-check provides a simple, powerful API while automatically capturing execution context—git state, file changes, tool calls, and more.
Test & Evaluate Agents
Write tests to validate agent behavior, benchmark different models, and enforce quality standards.
Automate Workflows
Build multi-stage pipelines that chain agents together to accomplish complex automation tasks.
Benchmark & Compare
Use matrix testing and LLM-based judges to compare models, prompts, and configurations.
Explore the API
Dive into the complete API reference to discover all available features and types.
import { vibeTest } from '@dao/vibe-check';
vibeTest('agent can refactor code', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor src/auth.ts --add-tests' });
// Automatic context capture expect(result).toCompleteAllTodos(); expect(result).toHaveChangedFiles(['src/auth.ts', 'src/auth.test.ts']); expect(result).toStayUnderCost(2.00);});
vibeTest
- For testing and evaluation with assertionsvibeWorkflow
- For multi-stage automation pipelinesBoth share the same primitives (runAgent
, defineAgent
, judge
) but optimize for different use cases.
When you run an agent, vibe-check automatically captures:
No manual artifact management required.
Fail fast with assertions that run during agent execution:
const execution = runAgent({ prompt: '/refactor' });
execution.watch(({ files }) => { expect(files.changed()).not.toContain('database/');});
await execution; // Aborts if database files touched
Use Claude as a judge to evaluate code quality:
const judgment = await judge(result, { rubric: { name: 'Code Quality', criteria: [ { name: 'has_tests', description: 'Added comprehensive test coverage' }, { name: 'no_todos', description: 'No TODO comments left behind' } ] }});
expect(judgment.passed).toBe(true);
Compare models, prompts, and configurations with Cartesian product generation:
defineTestSuite({ matrix: { model: ['claude-3-5-sonnet-latest', 'claude-3-5-haiku-latest'], maxTurns: [5, 10] }, test: ({ model, maxTurns }) => { vibeTest(`${model} in ${maxTurns} turns`, async ({ runAgent }) => { // Test runs 4 times (2 models × 2 maxTurns) }); }});
Installation
Install vibe-check and configure your first project.
5-Minute Quickstart
Write your first test in under 5 minutes.
This documentation follows the Diátaxis framework:
Can’t find what you’re looking for? Use the search bar above or open an issue.