Testing Guides

Learn how to write effective tests for your Claude Code agents using vibe-check’s powerful testing features.

Available Guides

Reactive Watchers

Monitor agent execution in real-time with fail-fast assertions using AgentExecution.watch().

Cumulative State

Track state across multiple runs to build comprehensive test scenarios.

Custom Matchers

Use specialized matchers for files, tools, quality checks, and cost constraints.

Matrix Testing

Generate Cartesian product tests to benchmark models and configurations.

Overview

Testing Claude Code agents requires specialized tools and patterns. These guides cover:

Real-time monitoring - Watch execution progress and fail fast on errors
Multi-run tracking - Aggregate state across multiple agent invocations
Assertion patterns - Use custom matchers for common validation scenarios
Benchmarking - Compare performance across models and configurations

Guide Details

Reactive Watchers

Monitor agent execution in real-time and implement fail-fast assertions.

You’ll learn:

Using AgentExecution.watch() for real-time monitoring
Accessing partial results during execution
Implementing early termination conditions
Detecting errors and anomalies as they happen

Use cases:

Failing fast on file deletions
Monitoring tool usage patterns
Budget enforcement during execution
Real-time quality gates

Cumulative State

Track and aggregate state across multiple agent runs for comprehensive testing.

You’ll learn:

Building cumulative context objects
Aggregating results from multiple runs
Cross-run state analysis
Persistent state management

Use cases:

Multi-step test workflows
Progressive refinement testing
Long-running automation validation
State evolution tracking

Custom Matchers

Master all available matchers for comprehensive test assertions.

You’ll learn:

File-based matchers (toHaveChangedFiles, toHaveNoDeletedFiles)
Tool-based matchers (toHaveUsedTool, toUseOnlyTools)
Quality matchers (toCompleteAllTodos, toPassRubric)
Cost matchers (toStayUnderCost)
Chaining matchers for complex assertions

Use cases:

Validating file operations
Enforcing tool usage policies
Quality gates with LLM judges
Budget enforcement

Matrix Testing

Generate Cartesian product tests for systematic model and configuration benchmarking.

You’ll learn:

Using defineTestSuite for test generation
Creating configuration matrices
Comparing model performance
Analyzing benchmark results

Use cases:

Model comparison (Sonnet vs Opus vs Haiku)
Configuration optimization
Regression testing across versions
Performance profiling

Tutorials

Your First Test - Step-by-step test creation
Your First Evaluation - Benchmarking basics

API Reference

vibeTest - Test function API
RunResult - Result inspection interface
Custom Matchers - Complete matcher reference

Concepts

Auto-Capture - How context is captured automatically
Lazy Loading - Memory-efficient file access

Testing Guides

Available Guides

Overview

Guide Details

Reactive Watchers

Cumulative State

Custom Matchers

Matrix Testing

Related Documentation

Tutorials

API Reference

Concepts