Skip to content

5-Minute Quickstart

Welcome! This quickstart will get you from zero to running your first agent test in under 5 minutes.

A simple test that:

  1. Runs a Claude Code agent
  2. Captures execution metrics automatically
  3. Validates the agent completed successfully
  4. Generates a cost report
  1. Create a test file

    Create tests/quickstart.vibe.test.ts:

    import { vibeTest } from '@dao/vibe-check';
    vibeTest('agent can execute simple task', async ({ runAgent, expect }) => {
    const result = await runAgent({
    prompt: 'Count from 1 to 5'
    });
    // Verify we got a result
    expect(result).toBeDefined();
    expect(result.metrics.totalCostUsd).toBeLessThan(1.00);
    });
  2. Run the test

    Terminal window
    bun run vitest
    # or: npm run vitest, pnpm vitest, yarn vitest
  3. See the results

    You should see output like:

    ✓ tests/quickstart.vibe.test.ts (1)
    ✓ agent can execute simple task (2341ms)
    Test Files 1 passed (1)
    Tests 1 passed (1)
    Duration 2.34s
    Vibe Check Cost Summary
    ─────────────────────────────
    Total Cost: $0.0028
    Total Tokens: 847
    HTML Report: .vibe-artifacts/reports/index.html

🎉 Congratulations! You just ran your first vibe-check test!


Let’s break down what happened:

vibeTest('agent can execute simple task', async ({ runAgent, expect }) => {
// Test body
});
  • vibeTest is like Vitest’s test() or it(), but with agent-specific context
  • Test context provides: runAgent, expect, judge, annotate, task
  • Async by default - all vibe tests are async (agents take time!)
const result = await runAgent({
prompt: 'Count from 1 to 5'
});
  • Executes a Claude Code agent with the given prompt
  • Automatically captures:
    • Conversation messages
    • Git state (before/after)
    • File changes
    • Tool calls
    • Metrics (cost, tokens, duration)
    • Timeline events
  • Returns a RunResult with all captured data
expect(result).toBeDefined();
expect(result.metrics.totalCostUsd).toBeLessThan(1.00);

The result contains everything from the agent run:

FieldDescription
result.bundleDirPath to on-disk artifacts (.vibe-artifacts/bundles/{test-id}/)
result.metricsCost, tokens, duration, tool calls, files changed
result.messagesConversation history (lazy-loaded)
result.filesFile changes with before/after content
result.toolsTool calls (Edit, Bash, Read, etc.)
result.gitGit state and diffs
result.todosTODO items and their completion status
result.timelineUnified event timeline

Let’s write a more detailed test that explores what vibe-check captured:

import { vibeTest } from '@dao/vibe-check';
vibeTest('explore captured context', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Create a file called greeting.txt with "Hello World"'
});
// 1. Check metrics
console.log('Cost:', result.metrics.totalCostUsd);
console.log('Tokens:', result.metrics.totalTokens);
console.log('Duration:', result.metrics.durationMs, 'ms');
console.log('Tool calls:', result.metrics.toolCalls);
// 2. Check file changes
const fileStats = result.files.stats();
console.log('Files changed:', fileStats);
// Output: { added: 1, modified: 0, deleted: 0, renamed: 0, total: 1 }
// 3. Get specific file
const greeting = result.files.get('greeting.txt');
if (greeting) {
console.log('File path:', greeting.path);
console.log('Change type:', greeting.changeType); // 'added'
// Load file content (lazy-loaded from disk)
const content = await greeting.after?.text();
expect(content).toContain('Hello World');
}
// 4. Check tool usage
const toolsUsed = result.tools.all();
console.log('Tools used:', toolsUsed.map(t => t.name));
// Example: ['Write', 'Bash']
const writeToolCount = result.tools.used('Write');
expect(writeToolCount).toBeGreaterThan(0);
// 5. Check git state (if workspace is a git repo)
if (result.git.before) {
console.log('Git commit before:', result.git.before.head);
console.log('Git commit after:', result.git.after?.head);
console.log('Files changed:', result.git.changedCount);
}
});

Vibe-check provides custom matchers for common assertions:

import { vibeTest } from '@dao/vibe-check';
vibeTest('using custom matchers', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Create a README.md file'
});
// File-based matchers
expect(result).toHaveChangedFiles(['README.md']);
expect(result).toHaveNoDeletedFiles();
// Tool-based matchers
expect(result).toHaveUsedTool('Write');
// Cost matchers
expect(result).toStayUnderCost(0.50); // Fails if cost > $0.50
// Quality matchers (requires TODOs in prompt)
// expect(result).toCompleteAllTodos();
});

See all available matchers in the Custom Matchers Guide →


After running tests, open the HTML report to see:

  • Conversation transcript - Full agent chat history
  • Tool call timeline - Visual timeline of all tool invocations
  • File diffs - Before/after views of changed files
  • Cost breakdown - Per-test cost and token usage
  • TODO tracking - See which TODOs were completed
Terminal window
open .vibe-artifacts/reports/index.html

Vibe-check automatically tracks costs:

Terminal Reporter:

Vibe Check Cost Summary
─────────────────────────────
Total Cost: $0.0028
Total Tokens: 847

Per-Test Cost:

vibeTest('check cost', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Simple task',
model: 'claude-3-5-haiku-latest' // Use cheaper model
});
console.log('This test cost:', result.metrics.totalCostUsd);
expect(result).toStayUnderCost(0.10);
});

When you run an agent, vibe-check automatically captures:

  • Commit hash before/after
  • Dirty status
  • File diffs (git diff --name-status)
  • All files added/modified/deleted/renamed
  • Full before/after content (content-addressed storage)
  • Diff statistics (lines added/deleted)
  • Correlated from PreToolUse/PostToolUse hooks
  • Input parameters
  • Output/results
  • Success/failure status
  • Duration
  • Total tokens (input + output)
  • Total cost (USD)
  • Wall clock duration
  • Tool call count
  • Files changed count
  • Unified event stream (SDK events + hooks)
  • Timestamps for all events
  • Used by reporters for visualization
  • TODO items from Claude Code
  • Status (pending/in_progress/completed)

You’ve learned the basics! Now explore:


vibeTest('multi-agent workflow', async ({ runAgent, expect }) => {
// First agent: analyze
const analysis = await runAgent({
prompt: 'Analyze the code structure'
});
// Second agent: refactor (using first result as context)
const refactor = await runAgent({
prompt: 'Refactor based on analysis',
context: analysis // Pass previous result
});
expect(refactor).toHaveChangedFiles(['src/**/*.ts']);
});
vibeTest('model comparison', async ({ runAgent, expect }) => {
const sonnetResult = await runAgent({
model: 'claude-3-5-sonnet-latest',
prompt: 'Refactor this code'
});
console.log('Sonnet cost:', sonnetResult.metrics.totalCostUsd);
});
vibeTest('long-running task', async ({ runAgent, expect }) => {
const result = await runAgent({
prompt: 'Complex refactoring task',
timeoutMs: 600000, // 10 minutes
maxTurns: 20 // Allow more agent turns
});
expect(result).toCompleteAllTodos();
}, { timeout: 650000 }); // Vitest timeout slightly higher

Solution: Increase timeout for both vibe-check and Vitest:

vibeTest('task', async ({ runAgent }) => {
const result = await runAgent({
timeoutMs: 300000 // 5 minutes for agent
});
}, { timeout: 310000 }); // 5min 10sec for test

Cause: Workspace might not be a git repository

Solution: Ensure your test runs in a git repository, or specify workspace:

const result = await runAgent({
workspace: '/path/to/git/repo',
prompt: 'Task'
});

Cause: Using a model without cost tracking, or SDK didn’t report usage

Solution: Check result.metrics.totalTokens - if also 0, the SDK may not have tracked usage properly.


Ready to write real tests? Continue to Your First Test →