Custom Matchers
Vibe-check provides custom Vitest matchers tailored for testing agent behavior. These matchers work on RunResult
objects and provide expressive, readable assertions.
Available Matchers
Section titled “Available Matchers”File Matchers
Section titled “File Matchers”toHaveChangedFiles(paths)
Section titled “toHaveChangedFiles(paths)”Assert that specific files were changed (supports glob patterns).
vibeTest('changes expected files', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor' });
// Exact paths expect(result).toHaveChangedFiles(['src/auth.ts', 'tests/auth.test.ts']);
// Glob patterns expect(result).toHaveChangedFiles(['src/**/*.ts']);
// Single path expect(result).toHaveChangedFiles('src/main.ts');});
Parameters:
paths
-string | string[]
- Exact paths or glob patterns
Passes when: All specified paths/patterns match changed files.
Fails when: Any specified path is not in the list of changed files.
toHaveNoDeletedFiles()
Section titled “toHaveNoDeletedFiles()”Assert that no files were deleted during execution.
vibeTest('never deletes files', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor' });
expect(result).toHaveNoDeletedFiles();});
Passes when: No files have changeType === 'deleted'
.
Fails when: Any file was deleted.
Use case: Protect against destructive refactorings.
Tool Matchers
Section titled “Tool Matchers”toHaveUsedTool(name, opts?)
Section titled “toHaveUsedTool(name, opts?)”Assert that a specific tool was used, optionally with a minimum count.
vibeTest('uses Edit tool', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor' });
// Basic usage (at least once) expect(result).toHaveUsedTool('Edit');
// With minimum count expect(result).toHaveUsedTool('Edit', { min: 3 });});
Parameters:
name
-string
- Tool name (e.g., ‘Edit’, ‘Read’, ‘Bash’)opts.min
-number
(optional) - Minimum usage count (default: 1)
Passes when: Tool was used at least min
times.
Fails when: Tool was not used, or used fewer than min
times.
toUseOnlyTools(allowlist)
Section titled “toUseOnlyTools(allowlist)”Assert that only allowed tools were used (whitelist pattern).
vibeTest('only uses safe tools', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/analyze code' });
// Only allow reading/grepping, no modifications expect(result).toUseOnlyTools(['Read', 'Grep', 'Glob']);});
Parameters:
allowlist
-string[]
- List of allowed tool names
Passes when: All tool calls are in the allowlist.
Fails when: Any tool not in allowlist was used.
Use case: Enforce read-only operations, prevent destructive tools.
Quality Matchers
Section titled “Quality Matchers”toCompleteAllTodos()
Section titled “toCompleteAllTodos()”Assert that all TODOs were completed (none pending or in_progress).
vibeTest('completes all TODOs', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/implement feature X' });
expect(result).toCompleteAllTodos();});
Passes when: All TODOs have status === 'completed'
.
Fails when: Any TODO has status === 'pending'
or status === 'in_progress'
.
Use case: Verify agent finished all planned work.
toHaveNoErrorsInLogs()
Section titled “toHaveNoErrorsInLogs()”Assert that no errors occurred during execution (checks logs and timeline).
vibeTest('runs without errors', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/fix type errors' });
expect(result).toHaveNoErrorsInLogs();});
Passes when: No error events in timeline, no failed tool calls.
Fails when: Errors found in logs or tool failures detected.
Use case: Ensure clean execution without failures.
Cost Matchers
Section titled “Cost Matchers”toStayUnderCost(maxUsd)
Section titled “toStayUnderCost(maxUsd)”Assert that total cost stayed within budget.
vibeTest('stays under budget', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/add feature' });
// Budget: $2.00 expect(result).toStayUnderCost(2.00);});
Parameters:
maxUsd
-number
- Maximum allowed cost in USD
Passes when: result.metrics.totalCostUsd <= maxUsd
.
Fails when: Cost exceeds budget.
Use case: Enforce cost constraints for expensive operations.
LLM-Based Matchers
Section titled “LLM-Based Matchers”toPassRubric(rubric)
Section titled “toPassRubric(rubric)”Assert that the result passes an LLM-based quality evaluation.
vibeTest('meets quality standards', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor codebase' });
// Async matcher - uses judge internally await expect(result).toPassRubric({ name: 'Code Quality', criteria: [ { name: 'has_tests', description: 'Added comprehensive test coverage' }, { name: 'no_todos', description: 'No TODO comments left in code' }, { name: 'type_safe', description: 'All code is properly typed' } ] });});
Parameters:
rubric
-Rubric
- Evaluation criteria (see Rubrics Guide)
Passes when: Judge evaluation returns passed: true
.
Fails when: Judge evaluation fails or rubric criteria not met.
Use case: Quality gates that require semantic understanding.
Hook Capture Matchers
Section titled “Hook Capture Matchers”toHaveCompleteHookData()
Section titled “toHaveCompleteHookData()”Assert that all hook events were captured successfully.
vibeTest('has complete hook data', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/task' });
expect(result).toHaveCompleteHookData();});
Passes when: result.hookCaptureStatus.complete === true
.
Fails when: Hook capture was incomplete or failed.
Use case: Debug hook capture issues, ensure data integrity.
Usage Patterns
Section titled “Usage Patterns”Pattern 1: Comprehensive Validation
Section titled “Pattern 1: Comprehensive Validation”Combine multiple matchers for thorough validation:
vibeTest('full validation', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/implement auth with tests' });
// Files expect(result).toHaveChangedFiles(['src/auth.ts', 'tests/auth.test.ts']); expect(result).toHaveNoDeletedFiles();
// Quality expect(result).toCompleteAllTodos(); expect(result).toHaveNoErrorsInLogs();
// Cost expect(result).toStayUnderCost(2.00);
// Tools expect(result).toHaveUsedTool('Edit', { min: 2 });
// LLM evaluation await expect(result).toPassRubric({ name: 'Implementation Quality', criteria: [ { name: 'tests', description: 'Has comprehensive test coverage' }, { name: 'types', description: 'Properly typed' } ] });});
Pattern 2: File Change Validation
Section titled “Pattern 2: File Change Validation”Verify specific file patterns:
vibeTest('modifies only src/', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/refactor' });
// Only src/ files changed expect(result).toHaveChangedFiles(['src/**/*.ts']);
// No config/database changes const configChanged = result.files.changed().some(f => f.path.startsWith('config/') || f.path.startsWith('database/') ); expect(configChanged).toBe(false);
// No deletions expect(result).toHaveNoDeletedFiles();});
Pattern 3: Tool Allowlist
Section titled “Pattern 3: Tool Allowlist”Restrict tool usage:
vibeTest('read-only analysis', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/analyze codebase for security issues' });
// Only allow read tools expect(result).toUseOnlyTools(['Read', 'Grep', 'Glob', 'Bash']);
// Verify no Edit/Write calls const destructiveTools = result.tools.all().filter(t => ['Edit', 'Write', 'NotebookEdit'].includes(t.name) ); expect(destructiveTools).toHaveLength(0);});
Pattern 4: Cost-Aware Testing
Section titled “Pattern 4: Cost-Aware Testing”Enforce budgets per operation:
vibeTest('budget per complexity', async ({ runAgent, expect }) => { // Simple task: $1 budget const simple = await runAgent({ prompt: '/fix typo in README' }); expect(simple).toStayUnderCost(1.00);
// Medium task: $3 budget const medium = await runAgent({ prompt: '/add tests for one module' }); expect(medium).toStayUnderCost(3.00);
// Complex task: $10 budget const complex = await runAgent({ prompt: '/refactor entire codebase' }); expect(complex).toStayUnderCost(10.00);});
Pattern 5: Quality Gates
Section titled “Pattern 5: Quality Gates”Combine matchers with LLM evaluation:
vibeTest('quality gate', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/implement feature' });
// Basic quality checks expect(result).toCompleteAllTodos(); expect(result).toHaveNoErrorsInLogs(); expect(result).toHaveChangedFiles(['src/**', 'tests/**']);
// LLM-based quality evaluation await expect(result).toPassRubric({ name: 'Feature Quality', criteria: [ { name: 'tests', description: 'Has comprehensive test coverage' }, { name: 'docs', description: 'Added user-facing documentation' }, { name: 'types', description: 'All code is type-safe' }, { name: 'errors', description: 'Proper error handling' } ] });});
Using Matchers in Watchers
Section titled “Using Matchers in Watchers”Custom matchers work in reactive watchers:
vibeTest('matchers in watchers', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/refactor' });
execution.watch(({ files, tools, metrics }) => { // Note: Matchers work on RunResult, not PartialRunResult // Use standard expect() for partial state
// Files check const deleted = files.changed().filter(f => f.changeType === 'deleted'); expect(deleted).toHaveLength(0);
// Tool check expect(tools.failed().length).toBeLessThan(3);
// Cost check if (metrics.totalCostUsd) { expect(metrics.totalCostUsd).toBeLessThan(5.0); } });
const result = await execution;
// Custom matchers on final result expect(result).toCompleteAllTodos(); expect(result).toHaveChangedFiles(['src/**']);});
Negation
Section titled “Negation”All matchers support .not
for negation:
vibeTest('negation examples', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/task' });
// Should NOT change these files expect(result).not.toHaveChangedFiles(['config/**', 'database/**']);
// Should use more than just Read expect(result).not.toUseOnlyTools(['Read']);
// Should NOT stay under $0.50 (expect it to cost more) expect(result).not.toStayUnderCost(0.50);});
Matcher Comparison Table
Section titled “Matcher Comparison Table”Matcher | Validates | Use Case |
---|---|---|
toHaveChangedFiles | File changes match patterns | Verify expected modifications |
toHaveNoDeletedFiles | No deletions | Protect against data loss |
toHaveUsedTool | Tool was called | Verify specific tool usage |
toUseOnlyTools | Only allowed tools used | Enforce read-only or safe tools |
toCompleteAllTodos | All TODOs completed | Verify work finished |
toHaveNoErrorsInLogs | Clean execution | Ensure no failures |
toStayUnderCost | Within budget | Enforce cost constraints |
toPassRubric | LLM quality check | Semantic validation |
toHaveCompleteHookData | Hook capture complete | Debug data issues |
Best Practices
Section titled “Best Practices”- Combine matchers - Use multiple for comprehensive validation
- Start simple - Basic matchers first, LLM evaluation last
- Use globs - File patterns are more maintainable than exact paths
- Budget per task - Set appropriate cost limits based on complexity
- Allowlist tools - Use
toUseOnlyTools
for read-only or safe operations - Async matchers - Remember
await
fortoPassRubric
- Meaningful failures - Matchers provide detailed error messages
Troubleshooting
Section titled “Troubleshooting”Matcher Not Found
Section titled “Matcher Not Found”Problem: TypeScript error: Property ‘toXxx’ does not exist.
Cause: setupFiles not configured or custom matchers not imported.
Solution: Ensure defineVibeConfig
sets test.setupFiles: ['@dao/vibe-check/setup']
.
toHaveChangedFiles Always Fails
Section titled “toHaveChangedFiles Always Fails”Problem: Matcher fails even though files look correct.
Cause: Path mismatch (absolute vs relative, casing, separators).
Solution: Check result.files.changed().map(f => f.path)
to see exact paths, adjust patterns accordingly.
toPassRubric Never Resolves
Section titled “toPassRubric Never Resolves”Problem: Test hangs on await expect(result).toPassRubric(...)
.
Cause: Judge call failing or API key missing.
Solution: Check ANTHROPIC_API_KEY
env var, verify rubric is valid, check network connectivity.
toUseOnlyTools Too Strict
Section titled “toUseOnlyTools Too Strict”Problem: Matcher fails because of system tools (like Bash
for git).
Cause: Allowlist too narrow.
Solution: Add system tools to allowlist: ['Read', 'Edit', 'Bash', 'Grep']
.
See Also
Section titled “See Also”- API Reference: Custom Matchers - Full type definitions
- Reactive Watchers - Use matchers in watchers
- Rubrics - LLM-based evaluation patterns
- Your First Test - Matchers in action
Quick Reference
Section titled “Quick Reference”vibeTest('matcher quick reference', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: '/task' });
// Files expect(result).toHaveChangedFiles(['src/**/*.ts']); expect(result).toHaveNoDeletedFiles();
// Tools expect(result).toHaveUsedTool('Edit', { min: 2 }); expect(result).toUseOnlyTools(['Read', 'Edit', 'Bash']);
// Quality expect(result).toCompleteAllTodos(); expect(result).toHaveNoErrorsInLogs();
// Cost expect(result).toStayUnderCost(5.00);
// LLM evaluation (async) await expect(result).toPassRubric({ name: 'Quality', criteria: [{ name: 'test', description: 'Has tests' }] });
// Hook capture expect(result).toHaveCompleteHookData();});