Vibe Check

Automation and Evaluation framework for Claude Code agents, built on Vitest v3

What is Vibe Check?

Vibe Check (@dao/vibe-check) is a dual-purpose TypeScript testing framework that enables you to:

Test & Evaluate - Benchmark models, validate agent behavior, and enforce quality gates
Automate Workflows - Build multi-agent pipelines that orchestrate complex tasks

Built on Vitest v3, vibe-check provides a simple, powerful API while automatically capturing execution context—git state, file changes, tool calls, and more.

Choose Your Path

Test & Evaluate Agents

Write tests to validate agent behavior, benchmark different models, and enforce quality standards.

Start with Testing →

Automate Workflows

Build multi-stage pipelines that chain agents together to accomplish complex automation tasks.

Start with Automation →

Benchmark & Compare

Use matrix testing and LLM-based judges to compare models, prompts, and configurations.

Start with Evaluation →

Explore the API

Dive into the complete API reference to discover all available features and types.

Browse API Reference →

Quick Example

import { vibeTest } from '@dao/vibe-check';

vibeTest('agent can refactor code', async ({ runAgent, expect }) => {
  const result = await runAgent({
    prompt: '/refactor src/auth.ts --add-tests'
  });

  // Automatic context capture
  expect(result).toCompleteAllTodos();
  expect(result).toHaveChangedFiles(['src/auth.ts', 'src/auth.test.ts']);
  expect(result).toStayUnderCost(2.00);
});

Key Features

🎯 Dual API Design

vibeTest - For testing and evaluation with assertions
vibeWorkflow - For multi-stage automation pipelines

Both share the same primitives (runAgent, defineAgent, judge) but optimize for different use cases.

📸 Auto-Captured Context

When you run an agent, vibe-check automatically captures:

Git state (commits, branches, diffs)
File changes (before/after content)
Tool calls (correlated from hooks)
Timeline events (TODOs, notifications)
Metrics (cost, tokens, duration)

No manual artifact management required.

⚡ Reactive Watchers

Fail fast with assertions that run during agent execution:

const execution = runAgent({ prompt: '/refactor' });

execution.watch(({ files }) => {
  expect(files.changed()).not.toContain('database/');
});

await execution; // Aborts if database files touched

🧪 LLM-Based Evaluation

Use Claude as a judge to evaluate code quality:

const judgment = await judge(result, {
  rubric: {
    name: 'Code Quality',
    criteria: [
      { name: 'has_tests', description: 'Added comprehensive test coverage' },
      { name: 'no_todos', description: 'No TODO comments left behind' }
    ]
  }
});

expect(judgment.passed).toBe(true);

📊 Rich Reporting

Terminal Reporter - Real-time cost tracking and summaries
HTML Reporter - Interactive reports with transcripts, timelines, and diffs
Custom Matchers - Domain-specific assertions for agent behavior

🔄 Matrix Testing

Compare models, prompts, and configurations with Cartesian product generation:

defineTestSuite({
  matrix: {
    model: ['claude-3-5-sonnet-latest', 'claude-3-5-haiku-latest'],
    maxTurns: [5, 10]
  },
  test: ({ model, maxTurns }) => {
    vibeTest(`${model} in ${maxTurns} turns`, async ({ runAgent }) => {
      // Test runs 4 times (2 models × 2 maxTurns)
    });
  }
});

Why Vibe Check?

Vitest-native - No custom test runners; pure Vitest infrastructure
DX-first - Simple user-facing API hiding Vitest complexity
Type-safe - Full TypeScript support with strict typing
Production-ready - Built for both testing and automation
Well-documented - Comprehensive guides following Diátaxis framework

Ready to Start?

Installation

Install vibe-check and configure your first project.

Install Now →

5-Minute Quickstart

Write your first test in under 5 minutes.

Quickstart →

Documentation Structure

This documentation follows the Diátaxis framework:

Tutorials (Getting Started) - Step-by-step learning paths
How-To Guides (Guides) - Task-oriented recipes for specific problems
Reference (API Reference) - Dry, factual technical documentation
Explanation (Explanation) - Conceptual deep-dives and design decisions

Can’t find what you’re looking for? Use the search bar above or open an issue.