Skip to content

Cumulative State Tracking

Cumulative state allows you to access aggregated data from multiple runAgent() calls within a single test. This is essential for multi-step workflows where you need to validate the combined effect of several agent interactions.


Problem: Complex tasks require multiple agent runs. You need to validate the cumulative result, not just individual runs.

vibeTest('implement, test, and document feature', async ({ runAgent, expect }) => {
// Step 1: Implement feature
const step1 = await runAgent({
prompt: '/implement user authentication'
});
// Step 2: Add tests
const step2 = await runAgent({
prompt: '/add tests for authentication'
});
// Step 3: Add documentation
const step3 = await runAgent({
prompt: '/document authentication API'
});
// ❌ Problem: How do we check ALL files changed across all 3 steps?
// step1.files only has files from step 1
// step2.files only has files from step 2
// step3.files only has files from step 3
});

Solution: Use the test context’s cumulative state to access aggregated data:

vibeTest('implement, test, and document feature', async ({ runAgent, expect, files }) => {
// Step 1: Implement
await runAgent({ prompt: '/implement user authentication' });
// Step 2: Test
await runAgent({ prompt: '/add tests for authentication' });
// Step 3: Document
await runAgent({ prompt: '/document authentication API' });
// ✅ Access cumulative files from ALL runs
const allFiles = files.changed();
expect(allFiles).toContainEqual(
expect.objectContaining({ path: 'src/auth.ts' }) // From step 1
);
expect(allFiles).toContainEqual(
expect.objectContaining({ path: 'tests/auth.test.ts' }) // From step 2
);
expect(allFiles).toContainEqual(
expect.objectContaining({ path: 'docs/auth.md' }) // From step 3
);
});

The test context provides three cumulative state accessors:

Access all file changes across all runAgent() calls in the test.

vibeTest('track all file changes', async ({ runAgent, files }) => {
await runAgent({ prompt: '/refactor module A' });
await runAgent({ prompt: '/refactor module B' });
// Get all files changed across both runs
const allChanges = files.changed();
console.log(`Total files changed: ${allChanges.length}`);
// Get specific file
const moduleA = files.get('src/moduleA.ts');
console.log(`Module A status: ${moduleA?.changeType}`);
// Filter by glob
const srcFiles = files.filter('src/**/*.ts');
console.log(`Changed TypeScript files: ${srcFiles.length}`);
// Get statistics
const stats = files.stats();
console.log(`Added: ${stats.added}, Modified: ${stats.modified}`);
});

API:

interface CumulativeFiles {
/** Get all files changed across all runs */
changed(): FileChange[];
/** Get specific file by path */
get(path: string): FileChange | undefined;
/** Filter files by glob pattern(s) */
filter(glob: string | string[]): FileChange[];
/** Get aggregate statistics */
stats(): {
added: number;
modified: number;
deleted: number;
renamed: number;
total: number;
};
}

Access all tool calls across all runAgent() calls in the test.

vibeTest('track all tool usage', async ({ runAgent, tools }) => {
await runAgent({ prompt: '/task 1' });
await runAgent({ prompt: '/task 2' });
// Get all tool calls
const allCalls = tools.all();
console.log(`Total tool calls: ${allCalls.length}`);
// Get tools by name
const editCalls = tools.byName('Edit');
console.log(`Edit calls: ${editCalls.length}`);
// Check for failures
const failed = tools.failed();
if (failed.length > 0) {
console.warn(`${failed.length} tool calls failed`);
}
});

API:

interface CumulativeTools {
/** Get all tool calls across all runs */
all(): ToolCall[];
/** Get tool calls by name */
byName(name: string): ToolCall[];
/** Get failed tool calls */
failed(): ToolCall[];
/** Get successful tool calls */
succeeded(): ToolCall[];
}

Access all timeline events (TODOs, notifications, errors) across all runs.

vibeTest('track timeline events', async ({ runAgent, timeline }) => {
await runAgent({ prompt: '/task 1' });
await runAgent({ prompt: '/task 2' });
// Get all events
const allEvents = timeline.all();
// Get TODO events
const todos = timeline.todos();
const completedTodos = todos.filter(t => t.status === 'completed');
console.log(`Completed TODOs: ${completedTodos.length}`);
// Get notifications
const notifications = timeline.notifications();
console.log(`Notifications: ${notifications.length}`);
// Get errors
const errors = timeline.errors();
if (errors.length > 0) {
console.error('Errors occurred:', errors);
}
});

API:

interface CumulativeTimeline {
/** Get all timeline events */
all(): TimelineEvent[];
/** Get TODO events */
todos(): Array<{
text: string;
status: 'pending' | 'in_progress' | 'completed';
timestamp: number;
}>;
/** Get notification events */
notifications(): Array<{
message: string;
type?: string;
timestamp: number;
}>;
/** Get error events */
errors(): Array<{
message: string;
timestamp: number;
}>;
}

Goal: Verify complete outcome of multi-step process.

vibeTest('full feature development', async ({ runAgent, files, expect }) => {
// Step 1: Implement
await runAgent({
prompt: 'Implement user authentication in src/auth.ts'
});
// Step 2: Add tests
await runAgent({
prompt: 'Add comprehensive tests for authentication'
});
// Step 3: Add documentation
await runAgent({
prompt: 'Document authentication API in docs/auth.md'
});
// Validate cumulative result
const allFiles = files.changed();
expect(allFiles).toHaveLength(3);
expect(files.get('src/auth.ts')).toBeDefined();
expect(files.get('tests/auth.test.ts')).toBeDefined();
expect(files.get('docs/auth.md')).toBeDefined();
// Check file statistics
const stats = files.stats();
expect(stats.added).toBe(3);
expect(stats.deleted).toBe(0);
});

Goal: Track changes across multiple refinement iterations.

vibeTest('iterative code improvement', async ({ runAgent, files, tools, expect }) => {
// Initial implementation
await runAgent({ prompt: '/implement feature X' });
// Refinement 1: Fix type errors
await runAgent({ prompt: '/fix type errors' });
// Refinement 2: Add error handling
await runAgent({ prompt: '/add error handling' });
// Refinement 3: Optimize performance
await runAgent({ prompt: '/optimize performance' });
// Validate cumulative changes
const changedFiles = files.changed();
console.log(`Total iterations: 4`);
console.log(`Files modified: ${changedFiles.length}`);
console.log(`Tool calls: ${tools.all().length}`);
// Ensure no files were deleted during refinements
const deletions = changedFiles.filter(f => f.changeType === 'deleted');
expect(deletions).toHaveLength(0);
});

Goal: Validate changes across multiple modules/files.

vibeTest('refactor across modules', async ({ runAgent, files, expect }) => {
// Refactor each module
await runAgent({ prompt: '/refactor src/moduleA.ts' });
await runAgent({ prompt: '/refactor src/moduleB.ts' });
await runAgent({ prompt: '/refactor src/moduleC.ts' });
// Validate all modules changed
const modules = ['src/moduleA.ts', 'src/moduleB.ts', 'src/moduleC.ts'];
for (const module of modules) {
const file = files.get(module);
expect(file).toBeDefined();
expect(file?.changeType).toBe('modified');
}
// Validate no unexpected changes
const srcFiles = files.filter('src/**/*.ts');
expect(srcFiles).toHaveLength(3); // Only the 3 modules
});

Goal: Analyze tool usage patterns across multiple runs.

vibeTest('analyze tool usage', async ({ runAgent, tools, expect }) => {
await runAgent({ prompt: '/implement feature with tests' });
await runAgent({ prompt: '/add documentation' });
// Get tool usage breakdown
const allTools = tools.all();
const toolCounts = new Map<string, number>();
for (const tool of allTools) {
toolCounts.set(tool.name, (toolCounts.get(tool.name) || 0) + 1);
}
console.log('Tool usage:');
for (const [name, count] of toolCounts) {
console.log(` ${name}: ${count}`);
}
// Assertions on tool usage
expect(tools.byName('Edit').length).toBeGreaterThan(0);
expect(tools.byName('Read').length).toBeGreaterThan(0);
expect(tools.failed().length).toBeLessThan(3);
});

Goal: Verify TODO completion and error-free execution.

vibeTest('validate timeline', async ({ runAgent, timeline, expect }) => {
await runAgent({ prompt: '/implement feature X' });
await runAgent({ prompt: '/add tests for feature X' });
// Check TODOs
const todos = timeline.todos();
const completed = todos.filter(t => t.status === 'completed');
const pending = todos.filter(t => t.status === 'pending');
console.log(`Completed TODOs: ${completed.length}`);
console.log(`Pending TODOs: ${pending.length}`);
expect(pending).toHaveLength(0); // All TODOs should be completed
// Check for errors
const errors = timeline.errors();
expect(errors).toHaveLength(0); // No errors should occur
// Check notifications
const notifications = timeline.notifications();
console.log(`Notifications: ${notifications.length}`);
});

Understanding when to use cumulative state vs per-run state:

When: Validate individual agent run.

const result = await runAgent({ prompt: '/task' });
// Per-run state
expect(result).toCompleteAllTodos();
expect(result.files.changed()).toHaveLength(2);
expect(result.metrics.totalCostUsd).toBeLessThan(1.0);

Use for:

  • Validating single agent execution
  • Checking specific run metrics (cost, tokens, duration)
  • Per-run assertions (matchers like toCompleteAllTodos())

When: Validate aggregate of multiple runs.

vibeTest('multi-step', async ({ runAgent, files, tools }) => {
await runAgent({ prompt: '/task 1' });
await runAgent({ prompt: '/task 2' });
// Cumulative state
const allFiles = files.changed();
const allTools = tools.all();
expect(allFiles.length).toBeGreaterThan(0);
expect(tools.failed().length).toBeLessThan(5);
});

Use for:

  • Multi-step workflows
  • Aggregate statistics (total files, total tool calls)
  • Cross-run validation

FeaturePer-Run (RunResult)Cumulative (Context)
ScopeSingle runAgent() callAll runAgent() calls in test
Filesresult.files.changed()files.changed() (context)
Toolsresult.tools.all()tools.all() (context)
Timelineresult.timelinetimeline.all() (context)
Metricsresult.metricsSum across all runs
Matchersexpect(result).toXxx()Use standard expect()

Use both for granular validation:

vibeTest('validate per-run and cumulative', async ({ runAgent, files, expect }) => {
// Run 1: Implement
const step1 = await runAgent({ prompt: '/implement auth' });
expect(step1).toCompleteAllTodos(); // Per-run matcher
expect(step1.files.changed()).toHaveLength(1); // Per-run state
// Run 2: Add tests
const step2 = await runAgent({ prompt: '/add tests' });
expect(step2).toCompleteAllTodos(); // Per-run matcher
expect(step2.files.changed()).toHaveLength(1); // Per-run state
// Cumulative validation
const allFiles = files.changed();
expect(allFiles).toHaveLength(2); // Cumulative state
expect(files.stats().added).toBe(2); // Cumulative statistics
});

Cumulative state is scoped to each test. It resets between tests automatically.

vibeTest('test 1', async ({ runAgent, files }) => {
await runAgent({ prompt: '/task' });
expect(files.changed()).toHaveLength(1);
});
vibeTest('test 2', async ({ runAgent, files }) => {
// files.changed() is empty at start (state reset)
await runAgent({ prompt: '/task' });
expect(files.changed()).toHaveLength(1);
});

Note: Watchers receive PartialRunResult (per-run), not cumulative context.

vibeTest('watcher with cumulative check', async ({ runAgent, files: contextFiles }) => {
await runAgent({ prompt: '/task 1' });
const execution = runAgent({ prompt: '/task 2' });
execution.watch(({ files: runFiles }) => {
// runFiles: current run only (PartialRunResult)
// contextFiles: all runs (cumulative)
const currentRunFiles = runFiles.changed();
const allFiles = contextFiles.changed();
console.log(`Current run: ${currentRunFiles.length} files`);
console.log(`Cumulative: ${allFiles.length} files`);
});
await execution;
});

  1. Use cumulative for multi-step - Always use context state for multi-run tests
  2. Use per-run for single-step - Use RunResult for single runAgent() tests
  3. Combine when needed - Validate both per-run and cumulative as appropriate
  4. Check statistics - Use .stats() for quick summaries
  5. Filter with globs - Use files.filter('src/**') for targeted validation
  6. Verify no deletions - Check stats().deleted === 0 if files shouldn’t be removed

Problem: files.changed() doesn’t include expected files.

Cause: Files were not actually changed by agent, or agent run failed before changes.

Solution:

  • Check per-run RunResult.files to see what each run changed
  • Verify agent completed successfully (check logs)
  • Ensure workspace path is correct

Problem: tools.all().length doesn’t match expected count.

Cause: Some tool calls may have been skipped or failed silently.

Solution:

  • Check tools.failed() for failed tool calls
  • Review per-run RunResult.tools to see exact tool sequence
  • Verify hook capture is complete (result.hookCaptureStatus.complete)

Problem: timeline.todos() or timeline.notifications() is empty.

Cause: Agent didn’t emit those events, or hook capture failed.

Solution:

  • Check per-run RunResult.timeline for individual run events
  • Verify agent actually created TODOs/notifications
  • Check hookCaptureStatus for capture issues


vibeTest('multi-step example', async ({ runAgent, files, tools, timeline }) => {
// Multiple runs
await runAgent({ prompt: '/step 1' });
await runAgent({ prompt: '/step 2' });
// Cumulative files
const allFiles = files.changed();
const srcFiles = files.filter('src/**');
const stats = files.stats();
const specific = files.get('src/main.ts');
// Cumulative tools
const allTools = tools.all();
const editCalls = tools.byName('Edit');
const failed = tools.failed();
// Cumulative timeline
const todos = timeline.todos();
const notifications = timeline.notifications();
const errors = timeline.errors();
});