VibeTestContext

The VibeTestContext interface is injected into vibeTest functions via Vitest fixtures. It provides methods and properties for agent execution, evaluation, assertions, and cumulative state tracking.

Interface

interface VibeTestContext {
  runAgent(opts: RunAgentOptions): AgentExecution;
  judge<T = DefaultJudgmentResult>(
    result: RunResult,
    options: JudgeOptions<T>
  ): Promise<T>;
  expect: typeof import('vitest')['expect'];
  annotate(message: string, type?: string, attachment?: TestAttachment): Promise<void>;
  task: import('vitest').TestContext['task'];
  files: FileAccessor;
  tools: ToolAccessor;
  timeline: TimelineAccessor;
}

Properties

runAgent

runAgent(opts: RunAgentOptions): AgentExecution

Execute an agent with the given options. Automatically captures hooks, git state, file changes, and tool calls.

Parameters:

opts - Agent execution configuration

Returns:

AgentExecution - Thenable execution handle with reactive watch capabilities

Behavior:

State accumulates across multiple runAgent() calls in the same test
Access cumulative state via context.files, context.tools, context.timeline

Example:

vibeTest('multi-run test', async ({ runAgent, expect }) => {
  // First run
  const result1 = await runAgent({
    prompt: '/implement feature A'
  });

  // Second run (state accumulates)
  const result2 = await runAgent({
    prompt: '/implement feature B'
  });

  // Access cumulative state
  expect(context.files.changed().length).toBeGreaterThan(0);
});

See Also:

judge

judge<T = DefaultJudgmentResult>(
  result: RunResult,
  options: {
    rubric: Rubric;
    instructions?: string;
    resultFormat?: z.ZodType<T>;
    throwOnFail?: boolean;
  }
): Promise<T>

Evaluate a RunResult using LLM-based judgment. The judge is a specialized agent that formats the rubric into a prompt internally.

Type Parameters:

T - Type of judgment result (default: DefaultJudgmentResult)

Parameters:

result - The RunResult to evaluate
options.rubric - Evaluation criteria
options.instructions - Optional custom instructions for evaluation
options.resultFormat - Optional Zod schema for type-safe results
options.throwOnFail - If true, throws error when judgment fails

Returns:

Promise<T> - Structured judgment result

Example:

vibeTest('quality evaluation', async ({ runAgent, judge, expect }) => {
  const result = await runAgent({
    prompt: '/refactor code'
  });

  const judgment = await judge(result, {
    rubric: {
      name: 'Code Quality',
      criteria: [
        { name: 'readability', description: 'Code is easy to read' }
      ]
    }
  });

  expect(judgment.passed).toBe(true);
});

See Also:

expect

expect: typeof import('vitest')['expect']

Context-bound expect function for snapshot concurrency safety.

Purpose:

Use this instead of global expect in concurrent tests
Ensures snapshots are properly isolated

Example:

vibeTest.concurrent('concurrent test', async ({ runAgent, expect }) => {
  const result = await runAgent({ prompt: '/task' });

  // Use context.expect, not global expect
  expect(result.files.changed()).toMatchSnapshot();
});

annotate

annotate(
  message: string,
  type?: string,
  attachment?: TestAttachment
): Promise<void>

Stream annotations to reporters in real-time.

Parameters:

message - Annotation message
type - Optional annotation type (e.g., ‘info’, ‘warning’, ‘error’)
attachment - Optional file attachment (Vitest moves to attachmentsDir automatically)

Example:

vibeTest('annotated test', async ({ runAgent, annotate }) => {
  await annotate('Starting feature implementation');

  const result = await runAgent({
    prompt: '/implement feature'
  });

  await annotate('Implementation complete', 'success');
});

task

task: import('vitest').TestContext['task']

Access to Vitest task metadata for custom meta storage.

Use Cases:

Store custom metadata for reporters
Access test name, file path, etc.

Example:

vibeTest('metadata test', async ({ runAgent, task }) => {
  const result = await runAgent({ prompt: '/task' });

  // Store custom metadata
  task.meta.customData = { feature: 'auth' };
});

files

files: {
  changed(): FileChange[];
  get(path: string): FileChange | undefined;
  filter(glob: string | string[]): FileChange[];
  stats(): {
    added: number;
    modified: number;
    deleted: number;
    renamed: number;
    total: number;
  };
}

Access cumulative file changes across all runAgent() calls in this test.

Methods:

`changed()`

Returns all files changed across all agent runs.

const allFiles = context.files.changed();
console.log(`Total files changed: ${allFiles.length}`);

`get(path)`

Get a specific file by path.

const file = context.files.get('src/index.ts');
if (file) {
  const content = await file.after?.text();
}

`filter(glob)`

Filter files by glob pattern.

const tsFiles = context.files.filter('**/*.ts');
const testFiles = context.files.filter(['**/*.test.ts', '**/*.spec.ts']);

`stats()`

Get change statistics.

const stats = context.files.stats();
console.log(`Added: ${stats.added}`);
console.log(`Modified: ${stats.modified}`);
console.log(`Deleted: ${stats.deleted}`);
console.log(`Total: ${stats.total}`);

See Also:

tools

tools: {
  all(): ToolCall[];
  used(name: string): number;
  findFirst(name: string): ToolCall | undefined;
  filter(name: string): ToolCall[];
  failed(): ToolCall[];
  succeeded(): ToolCall[];
}

Access cumulative tool calls across all runAgent() calls in this test.

Methods:

`all()`

Get all tool calls from all runs.

const allTools = context.tools.all();
console.log(`Total tools used: ${allTools.length}`);

`used(name)`

Count uses of a specific tool.

const editCount = context.tools.used('Edit');
console.log(`Edit used ${editCount} times`);

`findFirst(name)`

Find the first use of a specific tool.

const firstWrite = context.tools.findFirst('Write');
if (firstWrite) {
  console.log('First file written:', firstWrite.input);
}

`filter(name)`

Get all calls to a specific tool.

const bashCalls = context.tools.filter('Bash');
bashCalls.forEach(call => {
  console.log('Command:', call.input);
});

`failed()`

Get all failed tool calls.

const failures = context.tools.failed();
if (failures.length > 0) {
  console.error('Failed tools:', failures.map(t => t.name));
}

`succeeded()`

Get all successful tool calls.

const successful = context.tools.succeeded();
console.log(`${successful.length} tools succeeded`);

See Also:

ToolCall →

timeline

timeline: {
  events(): AsyncIterable<TimelineEvent>;
}

Access unified timeline of events across all runAgent() calls.

Methods:

`events()`

Returns async iterable over all timeline events.

for await (const event of context.timeline.events()) {
  console.log(`${event.type} at ${event.timestamp}`);
}

Event Types:

tool_use - Tool invocation
tool_result - Tool completion
todo_update - TODO status change
notification - Agent notification

Module Augmentation

Vibe-check augments Vitest’s TestContext to include VibeTestContext:

declare module 'vitest' {
  export interface TestContext extends VibeTestContext {}
}

This allows TypeScript to recognize vibe-check properties in test functions.

Usage

Access VibeTestContext through destructuring in vibeTest functions:

import { vibeTest } from '@dao/vibe-check';

vibeTest('example', async ({ runAgent, judge, expect, files, tools }) => {
  // All properties available via destructuring
  const result = await runAgent({ prompt: '/task' });

  expect(result).toBeDefined();
  expect(files.changed().length).toBeGreaterThan(0);
  expect(tools.used('Edit')).toBeGreaterThan(0);
});