Skip to content

vibeTest

The vibeTest function is the primary entry point for writing agent tests with vibe-check. It wraps Vitest’s test() function and injects agent-specific fixtures into the test context.

function vibeTest(
name: string,
fn: (ctx: VibeTestContext) => Promise<void> | void,
timeoutOrOpts?: number | { timeout?: number }
): void;
ParameterTypeDescription
namestringTest name (displayed in reports)
fn(ctx: VibeTestContext) => Promise<void>Test function receiving context
timeoutOrOptsnumber | { timeout?: number }Optional timeout in ms (default: 300000 = 5 min)

The test function receives a context object with the following fixtures:

FixtureTypeDescription
runAgent(opts: RunAgentOptions) => AgentExecutionExecute an agent
judge<T>(result, opts) => Promise<T>Evaluate with LLM judge
expecttypeof expectVitest expect (context-bound)
annotate(message, type?, attachment?) => Promise<void>Stream annotations to reporters
taskTaskVitest task metadata
filesCumulativeFileAccessCumulative file changes
toolsCumulativeToolAccessCumulative tool calls
timelineCumulativeTimelineCumulative event timeline

See VibeTestContext → for complete interface documentation.


import { vibeTest } from '@dao/vibe-check';
vibeTest('agent can execute task', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Create a README.md file'
});
expect(result).toHaveChangedFiles(['README.md']);
});

Like Vitest’s test, vibeTest supports modifiers:

Skip a test:

vibeTest.skip('not ready yet', async ({ runAgent }) => {
// Test is skipped
});

Run only this test (exclusive):

vibeTest.only('focus on this one', async ({ runAgent }) => {
// Only this test runs
});

Run test concurrently with others:

vibeTest.concurrent('parallel test 1', async ({ runAgent }) => {
// Runs in parallel
});
vibeTest.concurrent('parallel test 2', async ({ runAgent }) => {
// Runs in parallel
});

Force sequential execution (within same file):

vibeTest.sequential('must run first', async ({ runAgent }) => {
// Runs first
});
vibeTest.sequential('must run second', async ({ runAgent }) => {
// Runs after first completes
});

Mark test as TODO (not implemented):

vibeTest.todo('implement later', async ({ runAgent }) => {
// Test is marked as TODO
});

Expect test to fail:

vibeTest.fails('known bug', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: 'Buggy task' });
expect(result).toCompleteAllTodos(); // Expected to fail
});

Use AgentExecution.watch() to fail early if conditions are violated during execution:

vibeTest('fail fast on violations', async ({ runAgent, expect }) => {
const execution = runAgent({
prompt: '/refactor src/',
maxTurns: 20
});
// Register watchers before awaiting
execution.watch(({ tools, metrics, files }) => {
// Fail if too many tool failures
expect(tools.failed().length).toBeLessThan(3);
// Fail if cost exceeds budget
expect(metrics.totalCostUsd).toBeLessThan(5.0);
// Fail if protected files touched
const changedPaths = files.changed().map(f => f.path);
expect(changedPaths).not.toContain('database/');
});
// Await completion (aborts if watcher fails)
const result = await execution;
// Final assertions (only run if watchers passed)
expect(result).toCompleteAllTodos();
});

When running multiple agents in a single test, use cumulative state from context:

vibeTest('multi-agent workflow', async ({ runAgent, files, tools, expect }) => {
// First agent: analyze
const analyze = await runAgent({
prompt: '/analyze src/'
});
// Second agent: fix
const fix = await runAgent({
prompt: '/fix issues based on analysis'
});
// Access cumulative state across both runs
expect(files.changed()).toHaveLength(10);
expect(files.filter('**/*.ts')).toHaveLength(8);
expect(tools.used('Edit')).toBeGreaterThan(5);
// Or use matchers on individual results
expect(analyze).toHaveUsedTool('Read');
expect(fix).toHaveChangedFiles(['src/**/*.ts']);
});

Use the judge fixture to evaluate code quality with an LLM:

vibeTest('quality evaluation', async ({ runAgent, judge, expect }) => {
const result = await runAgent({
prompt: 'Refactor src/auth.ts --add-tests'
});
const judgment = await judge(result, {
rubric: {
name: 'Code Quality',
criteria: [
{
name: 'has_tests',
description: 'Added comprehensive unit tests',
weight: 0.4
},
{
name: 'type_safety',
description: 'Uses TypeScript strict types',
weight: 0.3
},
{
name: 'clean_code',
description: 'Code is readable and maintainable',
weight: 0.3
}
],
passThreshold: 0.75
}
});
expect(judgment.passed).toBe(true);
expect(judgment.overallScore).toBeGreaterThan(0.8);
});

See judge() → for complete documentation.

Define custom judgment structure with Zod:

import { vibeTest } from '@dao/vibe-check';
import { z } from 'zod';
vibeTest('custom judgment format', async ({ runAgent, judge, expect }) => {
const result = await runAgent({
prompt: 'Implement feature from PRD'
});
// Define custom result schema
const CustomJudgment = z.object({
meetsRequirements: z.boolean(),
missingFeatures: z.array(z.string()),
codeQualityScore: z.number().min(0).max(1),
feedback: z.string()
});
// Type-safe judgment result
const judgment = await judge<z.infer<typeof CustomJudgment>>(result, {
rubric: {
name: 'Feature Implementation',
criteria: [
{ name: 'complete', description: 'All requirements met' },
{ name: 'quality', description: 'High code quality' }
]
},
resultFormat: CustomJudgment
});
// TypeScript knows the shape of judgment
expect(judgment.meetsRequirements).toBe(true);
expect(judgment.missingFeatures).toHaveLength(0);
expect(judgment.codeQualityScore).toBeGreaterThan(0.8);
});

Override the default 5-minute timeout:

// Inline timeout (10 minutes)
vibeTest('long task', async ({ runAgent }) => {
const result = await runAgent({
prompt: 'Complex refactoring',
timeoutMs: 600000 // 10 min for agent
});
}, 610000); // 10min 10sec for test
// Options object
vibeTest('long task', async ({ runAgent }) => {
// ...
}, { timeout: 600000 });

Stream custom annotations to reporters:

vibeTest('with annotations', async ({ runAgent, annotate }) => {
await annotate('Starting phase 1: Analysis');
const analyze = await runAgent({
prompt: '/analyze src/'
});
await annotate('Phase 1 complete', 'milestone', {
body: JSON.stringify({ filesAnalyzed: analyze.files.stats().total }),
contentType: 'application/json'
});
await annotate('Starting phase 2: Refactoring');
const refactor = await runAgent({
prompt: '/refactor based on analysis'
});
await annotate('All phases complete', 'success');
});

Annotations appear in:

  • Terminal output (during test run)
  • HTML report (timeline view)

All custom matchers work on RunResult objects:

vibeTest('using matchers', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor' });
// File matchers
expect(result).toHaveChangedFiles(['src/**/*.ts']);
expect(result).toHaveNoDeletedFiles();
// Tool matchers
expect(result).toHaveUsedTool('Edit');
expect(result).toUseOnlyTools(['Edit', 'Read', 'Write']);
// Quality matchers
expect(result).toCompleteAllTodos();
expect(result).toHaveNoErrorsInLogs();
// Cost matchers
expect(result).toStayUnderCost(2.00);
// Rubric matcher (uses judge internally)
await expect(result).toPassRubric({
name: 'Quality',
criteria: [
{ name: 'correct', description: 'Works correctly' }
]
});
});

See Custom Matchers → for complete reference.


vibeTest('create file', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Create a file called hello.txt with "Hello World"'
});
expect(result).toHaveChangedFiles(['hello.txt']);
const file = result.files.get('hello.txt');
const content = await file?.after?.text();
expect(content).toContain('Hello World');
});
vibeTest('stay under budget', async ({ runAgent, expect }) => {
const result = await runAgent({
model: 'claude-3-5-haiku-latest', // Cheaper model
prompt: 'Simple task'
});
expect(result).toStayUnderCost(0.10);
console.log('Cost:', result.metrics.totalCostUsd);
});
vibeTest('compare models', async ({ runAgent, expect }) => {
const sonnet = await runAgent({
model: 'claude-3-5-sonnet-latest',
prompt: 'Refactor auth.ts'
});
const haiku = await runAgent({
model: 'claude-3-5-haiku-latest',
prompt: 'Refactor auth.ts'
});
console.log('Sonnet cost:', sonnet.metrics.totalCostUsd);
console.log('Haiku cost:', haiku.metrics.totalCostUsd);
// Both should complete
expect(sonnet).toCompleteAllTodos();
expect(haiku).toCompleteAllTodos();
});
vibeTest('restrict tools', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Read and summarize README.md',
allowedTools: ['Read'] // Only allow Read tool
});
expect(result).toUseOnlyTools(['Read']);
expect(result.tools.used('Write')).toBe(0);
expect(result.tools.used('Edit')).toBe(0);
});