Skip to content

Custom Matchers

Vibe-check provides custom Vitest matchers tailored for testing agent behavior. These matchers work on RunResult objects and provide expressive, readable assertions.


Assert that specific files were changed (supports glob patterns).

vibeTest('changes expected files', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor' });
// Exact paths
expect(result).toHaveChangedFiles(['src/auth.ts', 'tests/auth.test.ts']);
// Glob patterns
expect(result).toHaveChangedFiles(['src/**/*.ts']);
// Single path
expect(result).toHaveChangedFiles('src/main.ts');
});

Parameters:

  • paths - string | string[] - Exact paths or glob patterns

Passes when: All specified paths/patterns match changed files.

Fails when: Any specified path is not in the list of changed files.

Assert that no files were deleted during execution.

vibeTest('never deletes files', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor' });
expect(result).toHaveNoDeletedFiles();
});

Passes when: No files have changeType === 'deleted'.

Fails when: Any file was deleted.

Use case: Protect against destructive refactorings.


Assert that a specific tool was used, optionally with a minimum count.

vibeTest('uses Edit tool', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor' });
// Basic usage (at least once)
expect(result).toHaveUsedTool('Edit');
// With minimum count
expect(result).toHaveUsedTool('Edit', { min: 3 });
});

Parameters:

  • name - string - Tool name (e.g., ‘Edit’, ‘Read’, ‘Bash’)
  • opts.min - number (optional) - Minimum usage count (default: 1)

Passes when: Tool was used at least min times.

Fails when: Tool was not used, or used fewer than min times.

Assert that only allowed tools were used (whitelist pattern).

vibeTest('only uses safe tools', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/analyze code' });
// Only allow reading/grepping, no modifications
expect(result).toUseOnlyTools(['Read', 'Grep', 'Glob']);
});

Parameters:

  • allowlist - string[] - List of allowed tool names

Passes when: All tool calls are in the allowlist.

Fails when: Any tool not in allowlist was used.

Use case: Enforce read-only operations, prevent destructive tools.


Assert that all TODOs were completed (none pending or in_progress).

vibeTest('completes all TODOs', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/implement feature X' });
expect(result).toCompleteAllTodos();
});

Passes when: All TODOs have status === 'completed'.

Fails when: Any TODO has status === 'pending' or status === 'in_progress'.

Use case: Verify agent finished all planned work.

Assert that no errors occurred during execution (checks logs and timeline).

vibeTest('runs without errors', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/fix type errors' });
expect(result).toHaveNoErrorsInLogs();
});

Passes when: No error events in timeline, no failed tool calls.

Fails when: Errors found in logs or tool failures detected.

Use case: Ensure clean execution without failures.


Assert that total cost stayed within budget.

vibeTest('stays under budget', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/add feature' });
// Budget: $2.00
expect(result).toStayUnderCost(2.00);
});

Parameters:

  • maxUsd - number - Maximum allowed cost in USD

Passes when: result.metrics.totalCostUsd <= maxUsd.

Fails when: Cost exceeds budget.

Use case: Enforce cost constraints for expensive operations.


Assert that the result passes an LLM-based quality evaluation.

vibeTest('meets quality standards', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor codebase' });
// Async matcher - uses judge internally
await expect(result).toPassRubric({
name: 'Code Quality',
criteria: [
{ name: 'has_tests', description: 'Added comprehensive test coverage' },
{ name: 'no_todos', description: 'No TODO comments left in code' },
{ name: 'type_safe', description: 'All code is properly typed' }
]
});
});

Parameters:

Passes when: Judge evaluation returns passed: true.

Fails when: Judge evaluation fails or rubric criteria not met.

Use case: Quality gates that require semantic understanding.


Assert that all hook events were captured successfully.

vibeTest('has complete hook data', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/task' });
expect(result).toHaveCompleteHookData();
});

Passes when: result.hookCaptureStatus.complete === true.

Fails when: Hook capture was incomplete or failed.

Use case: Debug hook capture issues, ensure data integrity.


Combine multiple matchers for thorough validation:

vibeTest('full validation', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: '/implement auth with tests'
});
// Files
expect(result).toHaveChangedFiles(['src/auth.ts', 'tests/auth.test.ts']);
expect(result).toHaveNoDeletedFiles();
// Quality
expect(result).toCompleteAllTodos();
expect(result).toHaveNoErrorsInLogs();
// Cost
expect(result).toStayUnderCost(2.00);
// Tools
expect(result).toHaveUsedTool('Edit', { min: 2 });
// LLM evaluation
await expect(result).toPassRubric({
name: 'Implementation Quality',
criteria: [
{ name: 'tests', description: 'Has comprehensive test coverage' },
{ name: 'types', description: 'Properly typed' }
]
});
});

Verify specific file patterns:

vibeTest('modifies only src/', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/refactor' });
// Only src/ files changed
expect(result).toHaveChangedFiles(['src/**/*.ts']);
// No config/database changes
const configChanged = result.files.changed().some(f =>
f.path.startsWith('config/') || f.path.startsWith('database/')
);
expect(configChanged).toBe(false);
// No deletions
expect(result).toHaveNoDeletedFiles();
});

Restrict tool usage:

vibeTest('read-only analysis', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: '/analyze codebase for security issues'
});
// Only allow read tools
expect(result).toUseOnlyTools(['Read', 'Grep', 'Glob', 'Bash']);
// Verify no Edit/Write calls
const destructiveTools = result.tools.all().filter(t =>
['Edit', 'Write', 'NotebookEdit'].includes(t.name)
);
expect(destructiveTools).toHaveLength(0);
});

Enforce budgets per operation:

vibeTest('budget per complexity', async ({ runAgent, expect }) => {
// Simple task: $1 budget
const simple = await runAgent({ prompt: '/fix typo in README' });
expect(simple).toStayUnderCost(1.00);
// Medium task: $3 budget
const medium = await runAgent({ prompt: '/add tests for one module' });
expect(medium).toStayUnderCost(3.00);
// Complex task: $10 budget
const complex = await runAgent({ prompt: '/refactor entire codebase' });
expect(complex).toStayUnderCost(10.00);
});

Combine matchers with LLM evaluation:

vibeTest('quality gate', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/implement feature' });
// Basic quality checks
expect(result).toCompleteAllTodos();
expect(result).toHaveNoErrorsInLogs();
expect(result).toHaveChangedFiles(['src/**', 'tests/**']);
// LLM-based quality evaluation
await expect(result).toPassRubric({
name: 'Feature Quality',
criteria: [
{ name: 'tests', description: 'Has comprehensive test coverage' },
{ name: 'docs', description: 'Added user-facing documentation' },
{ name: 'types', description: 'All code is type-safe' },
{ name: 'errors', description: 'Proper error handling' }
]
});
});

Custom matchers work in reactive watchers:

vibeTest('matchers in watchers', async ({ runAgent, expect }) => {
const execution = runAgent({ prompt: '/refactor' });
execution.watch(({ files, tools, metrics }) => {
// Note: Matchers work on RunResult, not PartialRunResult
// Use standard expect() for partial state
// Files check
const deleted = files.changed().filter(f => f.changeType === 'deleted');
expect(deleted).toHaveLength(0);
// Tool check
expect(tools.failed().length).toBeLessThan(3);
// Cost check
if (metrics.totalCostUsd) {
expect(metrics.totalCostUsd).toBeLessThan(5.0);
}
});
const result = await execution;
// Custom matchers on final result
expect(result).toCompleteAllTodos();
expect(result).toHaveChangedFiles(['src/**']);
});

All matchers support .not for negation:

vibeTest('negation examples', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/task' });
// Should NOT change these files
expect(result).not.toHaveChangedFiles(['config/**', 'database/**']);
// Should use more than just Read
expect(result).not.toUseOnlyTools(['Read']);
// Should NOT stay under $0.50 (expect it to cost more)
expect(result).not.toStayUnderCost(0.50);
});

MatcherValidatesUse Case
toHaveChangedFilesFile changes match patternsVerify expected modifications
toHaveNoDeletedFilesNo deletionsProtect against data loss
toHaveUsedToolTool was calledVerify specific tool usage
toUseOnlyToolsOnly allowed tools usedEnforce read-only or safe tools
toCompleteAllTodosAll TODOs completedVerify work finished
toHaveNoErrorsInLogsClean executionEnsure no failures
toStayUnderCostWithin budgetEnforce cost constraints
toPassRubricLLM quality checkSemantic validation
toHaveCompleteHookDataHook capture completeDebug data issues

  1. Combine matchers - Use multiple for comprehensive validation
  2. Start simple - Basic matchers first, LLM evaluation last
  3. Use globs - File patterns are more maintainable than exact paths
  4. Budget per task - Set appropriate cost limits based on complexity
  5. Allowlist tools - Use toUseOnlyTools for read-only or safe operations
  6. Async matchers - Remember await for toPassRubric
  7. Meaningful failures - Matchers provide detailed error messages

Problem: TypeScript error: Property ‘toXxx’ does not exist.

Cause: setupFiles not configured or custom matchers not imported.

Solution: Ensure defineVibeConfig sets test.setupFiles: ['@dao/vibe-check/setup'].

Problem: Matcher fails even though files look correct.

Cause: Path mismatch (absolute vs relative, casing, separators).

Solution: Check result.files.changed().map(f => f.path) to see exact paths, adjust patterns accordingly.

Problem: Test hangs on await expect(result).toPassRubric(...).

Cause: Judge call failing or API key missing.

Solution: Check ANTHROPIC_API_KEY env var, verify rubric is valid, check network connectivity.

Problem: Matcher fails because of system tools (like Bash for git).

Cause: Allowlist too narrow.

Solution: Add system tools to allowlist: ['Read', 'Edit', 'Bash', 'Grep'].



vibeTest('matcher quick reference', async ({ runAgent, expect }) => {
const result = await runAgent({ prompt: '/task' });
// Files
expect(result).toHaveChangedFiles(['src/**/*.ts']);
expect(result).toHaveNoDeletedFiles();
// Tools
expect(result).toHaveUsedTool('Edit', { min: 2 });
expect(result).toUseOnlyTools(['Read', 'Edit', 'Bash']);
// Quality
expect(result).toCompleteAllTodos();
expect(result).toHaveNoErrorsInLogs();
// Cost
expect(result).toStayUnderCost(5.00);
// LLM evaluation (async)
await expect(result).toPassRubric({
name: 'Quality',
criteria: [{ name: 'test', description: 'Has tests' }]
});
// Hook capture
expect(result).toHaveCompleteHookData();
});