5-Minute Quickstart
Welcome! This quickstart will get you from zero to running your first agent test in under 5 minutes.
What You’ll Build
Section titled “What You’ll Build”A simple test that:
- Runs a Claude Code agent
- Captures execution metrics automatically
- Validates the agent completed successfully
- Generates a cost report
Create Your First Test
Section titled “Create Your First Test”-
Create a test file
Create
tests/quickstart.vibe.test.ts
:import { vibeTest } from '@dao/vibe-check';vibeTest('agent can execute simple task', async ({ runAgent, expect }) => {const result = await runAgent({prompt: 'Count from 1 to 5'});// Verify we got a resultexpect(result).toBeDefined();expect(result.metrics.totalCostUsd).toBeLessThan(1.00);}); -
Run the test
Terminal window bun run vitest# or: npm run vitest, pnpm vitest, yarn vitest -
See the results
You should see output like:
✓ tests/quickstart.vibe.test.ts (1)✓ agent can execute simple task (2341ms)Test Files 1 passed (1)Tests 1 passed (1)Duration 2.34sVibe Check Cost Summary─────────────────────────────Total Cost: $0.0028Total Tokens: 847HTML Report: .vibe-artifacts/reports/index.html
🎉 Congratulations! You just ran your first vibe-check test!
Understanding the Test
Section titled “Understanding the Test”Let’s break down what happened:
The vibeTest
Function
Section titled “The vibeTest Function”vibeTest('agent can execute simple task', async ({ runAgent, expect }) => { // Test body});
vibeTest
is like Vitest’stest()
orit()
, but with agent-specific context- Test context provides:
runAgent
,expect
,judge
,annotate
,task
- Async by default - all vibe tests are async (agents take time!)
The runAgent
Fixture
Section titled “The runAgent Fixture”const result = await runAgent({ prompt: 'Count from 1 to 5'});
- Executes a Claude Code agent with the given prompt
- Automatically captures:
- Conversation messages
- Git state (before/after)
- File changes
- Tool calls
- Metrics (cost, tokens, duration)
- Timeline events
- Returns a
RunResult
with all captured data
The RunResult
Object
Section titled “The RunResult Object”expect(result).toBeDefined();expect(result.metrics.totalCostUsd).toBeLessThan(1.00);
The result
contains everything from the agent run:
Field | Description |
---|---|
result.bundleDir | Path to on-disk artifacts (.vibe-artifacts/bundles/{test-id}/ ) |
result.metrics | Cost, tokens, duration, tool calls, files changed |
result.messages | Conversation history (lazy-loaded) |
result.files | File changes with before/after content |
result.tools | Tool calls (Edit, Bash, Read, etc.) |
result.git | Git state and diffs |
result.todos | TODO items and their completion status |
result.timeline | Unified event timeline |
Exploring the RunResult
Section titled “Exploring the RunResult”Let’s write a more detailed test that explores what vibe-check captured:
import { vibeTest } from '@dao/vibe-check';
vibeTest('explore captured context', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: 'Create a file called greeting.txt with "Hello World"' });
// 1. Check metrics console.log('Cost:', result.metrics.totalCostUsd); console.log('Tokens:', result.metrics.totalTokens); console.log('Duration:', result.metrics.durationMs, 'ms'); console.log('Tool calls:', result.metrics.toolCalls);
// 2. Check file changes const fileStats = result.files.stats(); console.log('Files changed:', fileStats); // Output: { added: 1, modified: 0, deleted: 0, renamed: 0, total: 1 }
// 3. Get specific file const greeting = result.files.get('greeting.txt'); if (greeting) { console.log('File path:', greeting.path); console.log('Change type:', greeting.changeType); // 'added'
// Load file content (lazy-loaded from disk) const content = await greeting.after?.text(); expect(content).toContain('Hello World'); }
// 4. Check tool usage const toolsUsed = result.tools.all(); console.log('Tools used:', toolsUsed.map(t => t.name)); // Example: ['Write', 'Bash']
const writeToolCount = result.tools.used('Write'); expect(writeToolCount).toBeGreaterThan(0);
// 5. Check git state (if workspace is a git repo) if (result.git.before) { console.log('Git commit before:', result.git.before.head); console.log('Git commit after:', result.git.after?.head); console.log('Files changed:', result.git.changedCount); }});
Using Custom Matchers
Section titled “Using Custom Matchers”Vibe-check provides custom matchers for common assertions:
import { vibeTest } from '@dao/vibe-check';
vibeTest('using custom matchers', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: 'Create a README.md file' });
// File-based matchers expect(result).toHaveChangedFiles(['README.md']); expect(result).toHaveNoDeletedFiles();
// Tool-based matchers expect(result).toHaveUsedTool('Write');
// Cost matchers expect(result).toStayUnderCost(0.50); // Fails if cost > $0.50
// Quality matchers (requires TODOs in prompt) // expect(result).toCompleteAllTodos();});
See all available matchers in the Custom Matchers Guide →
Viewing the HTML Report
Section titled “Viewing the HTML Report”After running tests, open the HTML report to see:
- Conversation transcript - Full agent chat history
- Tool call timeline - Visual timeline of all tool invocations
- File diffs - Before/after views of changed files
- Cost breakdown - Per-test cost and token usage
- TODO tracking - See which TODOs were completed
open .vibe-artifacts/reports/index.html
Cost Tracking
Section titled “Cost Tracking”Vibe-check automatically tracks costs:
Terminal Reporter:
Vibe Check Cost Summary─────────────────────────────Total Cost: $0.0028Total Tokens: 847
Per-Test Cost:
vibeTest('check cost', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: 'Simple task', model: 'claude-3-5-haiku-latest' // Use cheaper model });
console.log('This test cost:', result.metrics.totalCostUsd); expect(result).toStayUnderCost(0.10);});
What’s Auto-Captured?
Section titled “What’s Auto-Captured?”When you run an agent, vibe-check automatically captures:
✅ Git State
Section titled “✅ Git State”- Commit hash before/after
- Dirty status
- File diffs (
git diff --name-status
)
✅ File Changes
Section titled “✅ File Changes”- All files added/modified/deleted/renamed
- Full before/after content (content-addressed storage)
- Diff statistics (lines added/deleted)
✅ Tool Calls
Section titled “✅ Tool Calls”- Correlated from PreToolUse/PostToolUse hooks
- Input parameters
- Output/results
- Success/failure status
- Duration
✅ Metrics
Section titled “✅ Metrics”- Total tokens (input + output)
- Total cost (USD)
- Wall clock duration
- Tool call count
- Files changed count
✅ Timeline
Section titled “✅ Timeline”- Unified event stream (SDK events + hooks)
- Timestamps for all events
- Used by reporters for visualization
✅ TODOs
Section titled “✅ TODOs”- TODO items from Claude Code
- Status (pending/in_progress/completed)
Next Steps
Section titled “Next Steps”You’ve learned the basics! Now explore:
Common Patterns
Section titled “Common Patterns”Run Multiple Agents
Section titled “Run Multiple Agents”vibeTest('multi-agent workflow', async ({ runAgent, expect }) => { // First agent: analyze const analysis = await runAgent({ prompt: 'Analyze the code structure' });
// Second agent: refactor (using first result as context) const refactor = await runAgent({ prompt: 'Refactor based on analysis', context: analysis // Pass previous result });
expect(refactor).toHaveChangedFiles(['src/**/*.ts']);});
Use Different Models
Section titled “Use Different Models”vibeTest('model comparison', async ({ runAgent, expect }) => { const sonnetResult = await runAgent({ model: 'claude-3-5-sonnet-latest', prompt: 'Refactor this code' });
console.log('Sonnet cost:', sonnetResult.metrics.totalCostUsd);});
Set Timeouts
Section titled “Set Timeouts”vibeTest('long-running task', async ({ runAgent, expect }) => { const result = await runAgent({ prompt: 'Complex refactoring task', timeoutMs: 600000, // 10 minutes maxTurns: 20 // Allow more agent turns });
expect(result).toCompleteAllTodos();}, { timeout: 650000 }); // Vitest timeout slightly higher
Troubleshooting
Section titled “Troubleshooting”Test times out
Section titled “Test times out”Solution: Increase timeout for both vibe-check and Vitest:
vibeTest('task', async ({ runAgent }) => { const result = await runAgent({ timeoutMs: 300000 // 5 minutes for agent });}, { timeout: 310000 }); // 5min 10sec for test
No files captured
Section titled “No files captured”Cause: Workspace might not be a git repository
Solution: Ensure your test runs in a git repository, or specify workspace:
const result = await runAgent({ workspace: '/path/to/git/repo', prompt: 'Task'});
Cost tracker shows $0.00
Section titled “Cost tracker shows $0.00”Cause: Using a model without cost tracking, or SDK didn’t report usage
Solution: Check result.metrics.totalTokens
- if also 0, the SDK may not have tracked usage properly.
Ready to write real tests? Continue to Your First Test →