Skip to content
Vibe Check logo - automation and evaluation framework for Claude Code agents

Vibe Check

Automation and Evaluation framework for Claude Code agents, built on Vitest v3

Vibe Check (@dao/vibe-check) is a dual-purpose TypeScript testing framework that enables you to:

  1. Test & Evaluate - Benchmark models, validate agent behavior, and enforce quality gates
  2. Automate Workflows - Build multi-agent pipelines that orchestrate complex tasks

Built on Vitest v3, vibe-check provides a simple, powerful API while automatically capturing execution context—git state, file changes, tool calls, and more.


Test & Evaluate Agents

Write tests to validate agent behavior, benchmark different models, and enforce quality standards.

Start with Testing →

Automate Workflows

Build multi-stage pipelines that chain agents together to accomplish complex automation tasks.

Start with Automation →

Benchmark & Compare

Use matrix testing and LLM-based judges to compare models, prompts, and configurations.

Start with Evaluation →

Explore the API

Dive into the complete API reference to discover all available features and types.

Browse API Reference →


import { vibeTest } from '@dao/vibe-check';
vibeTest('agent can refactor code', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: '/refactor src/auth.ts --add-tests'
});
// Automatic context capture
expect(result).toCompleteAllTodos();
expect(result).toHaveChangedFiles(['src/auth.ts', 'src/auth.test.ts']);
expect(result).toStayUnderCost(2.00);
});

  • vibeTest - For testing and evaluation with assertions
  • vibeWorkflow - For multi-stage automation pipelines

Both share the same primitives (runAgent, defineAgent, judge) but optimize for different use cases.

When you run an agent, vibe-check automatically captures:

  • Git state (commits, branches, diffs)
  • File changes (before/after content)
  • Tool calls (correlated from hooks)
  • Timeline events (TODOs, notifications)
  • Metrics (cost, tokens, duration)

No manual artifact management required.

Fail fast with assertions that run during agent execution:

const execution = runAgent({ prompt: '/refactor' });
execution.watch(({ files }) => {
expect(files.changed()).not.toContain('database/');
});
await execution; // Aborts if database files touched

Use Claude as a judge to evaluate code quality:

const judgment = await judge(result, {
rubric: {
name: 'Code Quality',
criteria: [
{ name: 'has_tests', description: 'Added comprehensive test coverage' },
{ name: 'no_todos', description: 'No TODO comments left behind' }
]
}
});
expect(judgment.passed).toBe(true);
  • Terminal Reporter - Real-time cost tracking and summaries
  • HTML Reporter - Interactive reports with transcripts, timelines, and diffs
  • Custom Matchers - Domain-specific assertions for agent behavior

Compare models, prompts, and configurations with Cartesian product generation:

defineTestSuite({
matrix: {
model: ['claude-3-5-sonnet-latest', 'claude-3-5-haiku-latest'],
maxTurns: [5, 10]
},
test: ({ model, maxTurns }) => {
vibeTest(`${model} in ${maxTurns} turns`, async ({ runAgent }) => {
// Test runs 4 times (2 models × 2 maxTurns)
});
}
});

  • Vitest-native - No custom test runners; pure Vitest infrastructure
  • DX-first - Simple user-facing API hiding Vitest complexity
  • Type-safe - Full TypeScript support with strict typing
  • Production-ready - Built for both testing and automation
  • Well-documented - Comprehensive guides following Diátaxis framework

Installation

Install vibe-check and configure your first project.

Install Now →

5-Minute Quickstart

Write your first test in under 5 minutes.

Quickstart →


This documentation follows the Diátaxis framework:

  • Tutorials (Getting Started) - Step-by-step learning paths
  • How-To Guides (Guides) - Task-oriented recipes for specific problems
  • Reference (API Reference) - Dry, factual technical documentation
  • Explanation (Explanation) - Conceptual deep-dives and design decisions

Can’t find what you’re looking for? Use the search bar above or open an issue.