Reactive Watchers
Reactive watchers let you run assertions during agent execution, allowing tests to fail fast when constraints are violated—without waiting for the agent to complete.
Why Use Watchers?
Section titled “Why Use Watchers?”Problem: Agent tests can be slow (10-60 seconds). If an agent violates a constraint early (e.g., modifies a protected file), you don’t want to wait for full completion before failing.
Solution: Watchers run assertions during execution and abort on the first failure.
Without Watchers (Slow Failure)
Section titled “Without Watchers (Slow Failure)”vibeTest('should only modify src/', async ({ runAgent, expect }) => { // Agent runs for 45 seconds, modifies database/ at second 5 const result = await runAgent({ prompt: '/refactor' });
// Fails here (after 45 seconds) ❌ expect(result).toHaveChangedFiles(['src/**']);});
Issue: Test runs for 45 seconds even though violation occurred at 5 seconds.
With Watchers (Fast Failure)
Section titled “With Watchers (Fast Failure)”vibeTest('should only modify src/', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/refactor' });
// Watcher checks after each tool call execution.watch(({ files }) => { const changed = files.changed().map(f => f.path); const violation = changed.find(p => !p.startsWith('src/')); if (violation) { throw new Error(`Cannot modify ${violation}`); } });
// Aborts at ~5 seconds when database/ is touched ✅ await execution;});
Benefit: Test fails after 5 seconds, saving 40 seconds per test run.
How Watchers Work
Section titled “How Watchers Work”Execution Flow
Section titled “Execution Flow”- Call
runAgent()
- ReturnsAgentExecution
(thenable, not a Promise) - Register watcher(s) - Call
.watch()
to add assertion functions - Agent runs - Watchers fire after significant hook events:
- PostToolUse (after each tool completes)
- TodoUpdate (when TODO status changes)
- Notification (when agent sends notifications)
- Watcher runs - Receives
PartialRunResult
with current execution state - Pass or fail:
- Pass → Next watcher runs (if any), execution continues
- Fail → Execution aborts immediately, test fails
- Await execution - Get final
RunResult
or catch watcher error
Watcher Execution Guarantees
Section titled “Watcher Execution Guarantees”const execution = runAgent({ prompt: '/refactor' });
execution .watch(({ tools }) => { // Watcher 1: Runs first console.log('Watcher 1 started'); expect(tools.failed().length).toBeLessThan(3); console.log('Watcher 1 passed'); }) .watch(({ metrics }) => { // Watcher 2: Runs after watcher 1 completes console.log('Watcher 2 started'); expect(metrics.totalCostUsd).toBeLessThan(5.0); console.log('Watcher 2 passed'); });
await execution;
Output:
Watcher 1 startedWatcher 1 passedWatcher 2 startedWatcher 2 passed
If watcher 1 throws: Watcher 2 never runs, execution aborts immediately.
PartialRunResult Interface
Section titled “PartialRunResult Interface”Watchers receive a PartialRunResult
object representing in-progress execution state.
interface PartialRunResult { /** Current metrics (incomplete until execution finishes) */ readonly metrics: { totalTokens?: number; totalCostUsd?: number; durationMs?: number; toolCalls?: number; filesChanged?: number; };
/** Tool calls completed or in-progress */ readonly tools: { all(): ToolCall[]; failed(): ToolCall[]; succeeded(): ToolCall[]; inProgress(): ToolCall[]; };
/** TODO items with current status */ readonly todos: Array<{ text: string; status: 'pending' | 'in_progress' | 'completed'; }>;
/** Files changed so far (may be incomplete) */ readonly files: { changed(): FileChange[]; get(path: string): FileChange | undefined; };
/** Whether execution has completed */ readonly isComplete: boolean;}
Common Patterns
Section titled “Common Patterns”Pattern 1: File Change Constraints
Section titled “Pattern 1: File Change Constraints”Goal: Ensure agent only modifies allowed files/directories.
vibeTest('only modify src/ and tests/', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/refactor codebase with comprehensive tests' });
execution.watch(({ files }) => { const changed = files.changed();
for (const file of changed) { const allowed = file.path.startsWith('src/') || file.path.startsWith('tests/');
if (!allowed) { throw new Error( `Agent tried to modify protected file: ${file.path}` ); } } });
const result = await execution; expect(result).toCompleteAllTodos();});
Pattern 2: Cost Budget Enforcement
Section titled “Pattern 2: Cost Budget Enforcement”Goal: Abort execution if cost exceeds budget mid-run.
vibeTest('stay within $5 budget', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/add feature with tests and docs' });
execution.watch(({ metrics }) => { if (metrics.totalCostUsd && metrics.totalCostUsd > 5.0) { throw new Error( `Cost exceeded $5.00 budget (current: $${metrics.totalCostUsd.toFixed(2)})` ); } });
const result = await execution; expect(result).toCompleteAllTodos();});
Pattern 3: Tool Failure Threshold
Section titled “Pattern 3: Tool Failure Threshold”Goal: Abort if agent hits too many failed tool calls (indicates agent is stuck).
vibeTest('max 3 failed tool calls', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/fix type errors in src/' });
execution.watch(({ tools }) => { const failures = tools.failed();
if (failures.length >= 3) { const failedTools = failures.map(t => t.name).join(', '); throw new Error( `Agent failed ${failures.length} tools (${failedTools}). Likely stuck.` ); } });
const result = await execution; expect(result).toHaveNoErrorsInLogs();});
Pattern 4: File Deletion Protection
Section titled “Pattern 4: File Deletion Protection”Goal: Prevent agent from deleting any files.
vibeTest('never delete files', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/refactor authentication module' });
execution.watch(({ files }) => { const deleted = files.changed().filter(f => f.changeType === 'deleted');
if (deleted.length > 0) { const paths = deleted.map(f => f.path).join(', '); throw new Error(`Agent deleted files: ${paths}`); } });
const result = await execution; expect(result).toHaveChangedFiles(['src/auth.ts', 'tests/auth.test.ts']);});
Pattern 5: Combined Constraints
Section titled “Pattern 5: Combined Constraints”Goal: Enforce multiple constraints with one watcher.
vibeTest('enforce multiple constraints', async ({ runAgent, expect }) => { const execution = runAgent({ prompt: '/implement new API endpoint with tests' });
execution.watch(({ files, tools, metrics }) => { // Constraint 1: No deletions const deleted = files.changed().filter(f => f.changeType === 'deleted'); expect(deleted).toHaveLength(0);
// Constraint 2: Max 2 tool failures expect(tools.failed().length).toBeLessThan(3);
// Constraint 3: Cost budget if (metrics.totalCostUsd) { expect(metrics.totalCostUsd).toBeLessThan(3.0); }
// Constraint 4: File allowlist const changed = files.changed().map(f => f.path); const violations = changed.filter(p => !p.startsWith('src/') && !p.startsWith('tests/') ); expect(violations).toHaveLength(0); });
const result = await execution; expect(result).toCompleteAllTodos();});
Advanced Usage
Section titled “Advanced Usage”Async Watchers
Section titled “Async Watchers”Watchers can be async for complex checks:
execution.watch(async ({ files }) => { const srcFiles = files.changed().filter(f => f.path.startsWith('src/'));
for (const file of srcFiles) { const content = await file.after?.text();
if (content && content.includes('console.log')) { throw new Error( `Found console.log in ${file.path}. Use proper logging.` ); } }});
Chaining Watchers
Section titled “Chaining Watchers”Watchers can be chained for readability:
const execution = runAgent({ prompt: '/refactor' });
execution .watch(checkFileConstraints) .watch(checkCostBudget) .watch(checkToolFailures);
await execution;
// Helper functions for reusabilityfunction checkFileConstraints({ files }: PartialRunResult) { const changed = files.changed(); // ... constraints}
function checkCostBudget({ metrics }: PartialRunResult) { if (metrics.totalCostUsd && metrics.totalCostUsd > 5.0) { throw new Error('Budget exceeded'); }}
function checkToolFailures({ tools }: PartialRunResult) { expect(tools.failed().length).toBeLessThan(3);}
Manual Abort
Section titled “Manual Abort”Abort execution programmatically (e.g., timeout):
const execution = runAgent({ prompt: '/long-running-task' });
// Abort after 60 secondsconst timeout = setTimeout(() => { execution.abort('Execution timeout');}, 60000);
try { const result = await execution; clearTimeout(timeout); expect(result).toCompleteAllTodos();} catch (error) { clearTimeout(timeout); if (error.message.includes('Execution timeout')) { console.warn('Agent timed out'); } throw error;}
When to Use Watchers
Section titled “When to Use Watchers”✅ Good Use Cases
Section titled “✅ Good Use Cases”- File protection - Prevent modifications to critical files (config, database, dependencies)
- Cost enforcement - Abort expensive operations early
- Tool failure detection - Stop agents that are stuck retrying failed tools
- Deletion protection - Prevent destructive operations
- Real-time validation - Check invariants during execution
❌ Avoid Watchers For
Section titled “❌ Avoid Watchers For”- Final assertions - Use standard
expect()
afterawait
for post-execution checks - Complex analysis - Watchers should be fast; use post-execution analysis for deep checks
- Every test - Only use when fail-fast behavior provides real value
- Debugging - Use
console.log
in watchers sparingly (can slow execution)
Comparison: Watchers vs Post-Execution
Section titled “Comparison: Watchers vs Post-Execution”Feature | Watchers (Reactive) | Post-Execution (Standard) |
---|---|---|
When | During execution | After completion |
Speed | Fail fast (early abort) | Wait for full completion |
Use Case | Constraints/budgets | Final validation |
Data | PartialRunResult (incomplete) | RunResult (complete) |
Performance | Small overhead per hook event | No overhead |
Rule of thumb: Use watchers for constraints (things that should abort). Use standard assertions for validation (things to check at the end).
Best Practices
Section titled “Best Practices”- Keep watchers fast - Aim for <10ms per watcher to avoid slowing execution
- Use descriptive errors - Include context (file paths, values) in error messages
- Combine related checks - One watcher with multiple assertions is better than many watchers
- Extract reusable functions - Define watcher functions for common patterns
- Test watcher logic - Ensure watcher assertions are correct (wrong logic = false negatives)
- Don’t overuse - Only use watchers when fail-fast provides real value
Troubleshooting
Section titled “Troubleshooting”Watcher Not Firing
Section titled “Watcher Not Firing”Problem: Watcher never runs even though agent is executing.
Cause: Watchers fire after significant hook events (PostToolUse, TodoUpdate, Notification). If agent isn’t using tools or updating TODOs, watchers won’t fire.
Solution: Use standard post-execution assertions instead, or verify agent is actually using tools.
Watcher Slowing Execution
Section titled “Watcher Slowing Execution”Problem: Tests run slower with watchers enabled.
Cause: Watcher logic is too slow (async operations, heavy computations).
Solution: Profile watcher timing, optimize slow operations, or move complex checks to post-execution.
False Negatives
Section titled “False Negatives”Problem: Watcher passes but constraint is violated.
Cause: Watcher logic has bugs or checks incomplete state.
Solution: Test watcher logic separately, add logging to verify watcher is seeing expected data.
See Also
Section titled “See Also”- API Reference: AgentExecution - Full interface documentation
- API Reference: PartialRunResult - Partial state interface
- Custom Matchers - Use matchers in watchers with
expect()
- Your First Test - Watchers in action
Quick Reference
Section titled “Quick Reference”// Basic watcherconst execution = runAgent({ prompt: '/task' });execution.watch(({ files, tools, metrics }) => { // Assertions here});await execution;
// Chained watchersexecution .watch(watcher1) .watch(watcher2) .watch(watcher3);
// Async watcherexecution.watch(async ({ files }) => { const content = await files.get('src/main.ts')?.after?.text(); expect(content).toBeDefined();});
// Manual abortexecution.abort('Custom reason');
// Error handlingtry { await execution;} catch (error) { // Watcher threw or execution failed}