Skip to content

Architecture Overview

Vibe-check is built on Vitest v3 with a clean layered architecture that hides complexity while exposing simple primitives. This page explains the high-level design and how components fit together.

┌─────────────────────────────────────────────────────────────┐
│ User Layer (What You Write) │
├─────────────────────────────────────────────────────────────┤
│ vibeTest() / vibeWorkflow() │
│ defineAgent() / prompt() / judge() │
│ expect().toHaveChangedFiles() / toPassRubric() │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Framework Layer (What We Build) │
├─────────────────────────────────────────────────────────────┤
│ VibeTestContext / WorkflowContext (Fixtures) │
│ AgentExecution (Reactive Watchers) │
│ ContextManager (Auto-Capture Orchestration) │
│ Custom Matchers (File/Tool/Cost/Judge) │
│ Reporters (Terminal Cost + HTML Reports) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Storage Layer (Hybrid Model) │
├─────────────────────────────────────────────────────────────┤
│ RunBundle (On-Disk) - events, hooks, files, summary │
│ task.meta (In-Memory) - metrics, bundleDir pointer │
│ Content-Addressed Files (SHA-256 + gzip) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Integration Layer (External Systems) │
├─────────────────────────────────────────────────────────────┤
│ Claude Code SDK (@anthropic-ai/sdk) │
│ Claude Code Hooks (PreToolUse, PostToolUse, etc.) │
│ Git (Capture diffs and file content) │
│ Vitest v3 (Test runner, reporters, lifecycle) │
└─────────────────────────────────────────────────────────────┘
  • Thin wrappers over Vitest’s test.extend()
  • Inject fixtures (runAgent, judge, expect, annotate)
  • Provide dual APIs for different use cases
  • Location: src/test/vibeTest.ts, src/test/vibeWorkflow.ts

Fixtures (VibeTestContext / WorkflowContext)

Section titled “Fixtures (VibeTestContext / WorkflowContext)”
  • Dependency injection for test functions
  • Manages state accumulation across runs
  • Provides ergonomic API surface
  • Location: src/test/context.ts
  • Executes Claude Code agents via SDK
  • Starts hook capture before execution
  • Returns AgentExecution (thenable with watchers)
  • Delegates capture to ContextManager
  • Location: src/runner/agentRunner.ts
  • Thenable class (not Promise subclass)
  • Supports reactive watchers for fail-fast testing
  • Wraps Promise with watch(), abort() methods
  • Location: src/runner/agentExecution.ts
  • Orchestrates capture lifecycle
  • Writes events/hooks to RunBundle during execution
  • Finalizes after execution (correlate, compute, persist)
  • Returns lazy RunResult interface
  • Location: src/runner/contextManager.ts

Responsibilities:

  1. Capture Phase - Write SDK events and hooks to disk
  2. Correlation Phase - Pair PreToolUse + PostToolUse → ToolCall
  3. Git Phase - Capture before/after state, compute diffs
  4. Finalization Phase - Generate summary.json, return RunResult
  • Canonical source of truth for execution data
  • NDJSON files for events/hooks (streaming-friendly)
  • Content-addressed file storage (SHA-256 hashing)
  • Gzip compression for large files
  • Location: .vibe-artifacts/{testId}/
  • Lightweight pointer to RunBundle
  • Cost metrics for terminal reporter
  • JSON-serializable (sent across Vitest workers)
  • Location: Vitest’s task.meta object
  • Provides ergonomic access to RunBundle data
  • File content loaded on-demand (text() / stream())
  • Minimizes memory usage for parallel tests
  • Location: src/runner/runResult.ts
  • LLM-based evaluation using rubrics
  • Formats rubric into prompt internally
  • Returns structured results (via Zod schemas)
  • Supports custom result types (generic parameter)
  • Location: src/judge/llmJudge.ts
  • Standard interface for evaluation criteria
  • User-defined criteria with optional weights
  • Pass/fail thresholds configurable
  • Location: src/judge/rubric.ts
  • File matchers: toHaveChangedFiles(), toHaveNoDeletedFiles()
  • Tool matchers: toHaveUsedTool(), toUseOnlyTools()
  • Quality matchers: toCompleteAllTodos(), toPassRubric()
  • Cost matchers: toStayUnderCost()
  • Location: src/test/matchers.ts
  • Aggregates cost across tests via task.meta
  • Prints summary at end of test run
  • Location: src/reporters/cost.ts
  • Generates rich reports with timelines, diffs, costs
  • Lazy-loads data from RunBundle for scalability
  • Renders conversation transcripts and tool calls
  • Location: src/reporters/html.ts
1. User calls runAgent({ agent, prompt })
├─ ContextManager created (bundleDir allocated)
├─ Hook capture started (env var: VIBE_BUNDLE_DIR)
└─ AgentExecution returned (thenable)
2. SDK executes agent
├─ SDK events → ContextManager.onSDKEvent() → events.ndjson
├─ Hooks fire → Hook script → hooks.ndjson
└─ Watchers invoked after each PostToolUse/Notification
3. Agent completes
├─ ContextManager.finalize() called
├─ Process hooks.ndjson (correlate tool calls)
├─ Capture git state (before/after comparison)
├─ Read file content (before/after)
├─ Generate summary.json
└─ Return RunResult (lazy accessors to bundle)
4. Test code uses RunResult
├─ Assertions: expect(result).toHaveChangedFiles(...)
├─ Judgments: await judge(result, { rubric })
└─ Context accumulates for next runAgent()
5. Test ends
├─ Metrics stored in task.meta
├─ RunBundle persists on disk
└─ Reporters read task.meta + bundles
vibeTest('multi-run', async ({ runAgent, files, tools }) => {
// Run 1
await runAgent({ agent: a1, prompt: 'task 1' });
// → context.files accumulates changes
// → context.tools accumulates tool calls
// Run 2
await runAgent({ agent: a2, prompt: 'task 2' });
// → context.files merges new changes
// → context.tools appends new calls
// Access cumulative state
files.changed(); // All files from both runs
tools.all(); // All tools from both runs
});
vibeWorkflow('pipeline', async (wf) => {
// Stage 1
await wf.stage('build', { agent: builder, prompt: '/build' });
// → wf.files tracks stage attribution
// → wf.timeline records stage context
// Stage 2 (inherits cumulative context)
await wf.stage('test', { agent: tester, prompt: '/test' });
// → wf.files.byStage('build') - files from stage 1
// → wf.files.allChanged() - all files
// Access cross-stage data
wf.tools.all(); // Tools with stage metadata
wf.timeline.events(); // Unified timeline
});

Everything is Vitest underneath:

  • vibeTesttest.extend
  • vibeWorkflowtest.extend (same infrastructure)
  • Custom matchers → Vitest matcher API
  • Reporters → Vitest reporter hooks
  • Lifecycle → Vitest setup/teardown

Why: Leverage battle-tested infrastructure, no reinventing wheels.

Simple user-facing API:

// What users write
vibeTest('test', async ({ runAgent, judge, expect }) => {
const result = await runAgent({ agent, prompt });
expect(result).toHaveChangedFiles(['src/**']);
});

What users DON’T see:

  • Vitest fixtures
  • Hook capture scripts
  • NDJSON parsing
  • Bundle management
  • Worker communication

Why: Reduce cognitive load, make testing feel natural.

Zero artifact management:

  • No manual git diff collection
  • No manual hook setup
  • No manual cost tracking
  • No manual file content reading

Why: Eliminate boilerplate, ensure consistency.

Small data in memory, large data on disk:

  • RunResult = ~50 KB (metadata only)
  • RunBundle = 10+ MB (full content)
  • Lazy loading bridges the two

Why: Scale to 100+ parallel tests without memory bloat.

Hook failures don’t fail tests:

  • Agent execution continues normally
  • Partial data still returned
  • Warnings logged to stderr
  • hookCaptureStatus indicates missing data

Why: Infrastructure issues shouldn’t block productivity.

src/
├── test/
│ ├── vibeTest.ts # vibeTest() implementation
│ ├── vibeWorkflow.ts # vibeWorkflow() implementation
│ ├── context.ts # Fixture definitions
│ └── matchers.ts # Custom matchers
├── runner/
│ ├── agentRunner.ts # runAgent() / stage() execution
│ ├── agentExecution.ts # AgentExecution class
│ ├── contextManager.ts # Hook capture orchestration
│ └── runResult.ts # RunResult lazy interface
├── judge/
│ ├── llmJudge.ts # judge() implementation
│ └── rubric.ts # Rubric validation
├── reporters/
│ ├── cost.ts # Terminal cost reporter
│ └── html.ts # HTML report generator
├── artifacts/
│ ├── bundle.ts # RunBundle I/O utilities
│ └── cleanup.ts # Bundle retention policy
├── sdk/
│ └── bridge.ts # SDK type imports
└── config/
└── index.ts # defineVibeConfig() helper

Why not Promise?

  • Promises can’t be extended with custom methods (watch, abort)
  • Thenable interface gives full control while staying awaitable

Benefits:

  • await execution works naturally
  • Promise.all([e1, e2]) works
  • execution.watch() enables reactive testing

Why SHA-256 hashing?

  • Deduplicates unchanged file content
  • Integrity verification (detect corruption)
  • Fast equality comparison without reading content

Benefits:

  • Disk space efficiency
  • Lazy loading performance
  • Reliable content addressing

Why centralize capture logic?

  • Single source of truth for all capture
  • Consistent data across tests/workflows
  • Easy to debug capture issues

Benefits:

  • Predictable behavior
  • Testable in isolation
  • Clean separation of concerns