How-To Guides

These guides provide practical solutions to specific problems. Each guide is task-oriented and focuses on achieving a particular goal.

Testing Guides

Learn testing patterns including reactive watchers, cumulative state, custom matchers, and matrix testing.

Automation Guides

Build robust workflows with multi-stage pipelines, loop patterns, and error handling.

Evaluation Guides

Evaluate agent quality using LLM judges, rubrics, and benchmarking strategies.

Advanced Guides

Master advanced features like MCP servers, cost optimization, bundle cleanup, and multi-modal prompts.

Testing

Learn patterns for writing effective tests and assertions.

Reactive Watchers

Implement fail-fast assertions with AgentExecution.watch() to catch issues early during agent runs.

Topics: Real-time monitoring, partial results, early termination, error detection

Cumulative State

Track and aggregate state across multiple agent runs for comprehensive testing scenarios.

Topics: Multi-run tracking, state aggregation, cross-run analysis, data persistence

Custom Matchers

Use all available matchers for files, tools, quality checks, and cost constraints.

Topics: File matchers, tool matchers, quality matchers, cost matchers, matcher chaining

Matrix Testing

Generate Cartesian product tests to benchmark multiple models and configurations.

Topics: Test generation, model comparison, configuration matrices, performance analysis

Automation

Build production-ready agent workflows and pipelines.

Building Workflows

Create multi-stage workflows that orchestrate complex agent interactions.

Topics: Stage definitions, cumulative context, cross-stage data sharing, pipeline composition

Loop Patterns

Implement retry logic and iterative workflows using until() helpers.

Topics: Retry strategies, convergence testing, iterative refinement, condition checking

Error Handling

Build resilient workflows with comprehensive error handling strategies.

Topics: Error recovery, graceful degradation, fallback strategies, error reporting

Evaluation

Evaluate and benchmark agent quality systematically.

Using Judges

Leverage LLM-based evaluation to assess agent output quality.

Topics: Judge configuration, rubric application, scoring systems, quality gates

Writing Rubrics

Design effective rubrics for consistent and reliable evaluation.

Topics: Rubric structure, criterion design, scoring scales, best practices

Benchmarking

Compare models, configurations, and prompts with systematic benchmarking.

Topics: Performance metrics, cost analysis, model comparison, regression detection

Advanced

Master advanced features and optimization techniques.

MCP Servers

Integrate Model Context Protocol servers for enhanced agent capabilities.

Topics: MCP configuration, server integration, tool availability, context management

Cost Optimization

Reduce costs while maintaining quality through strategic optimizations.

Topics: Token reduction, model selection, prompt optimization, caching strategies

Bundle Cleanup

Manage artifact storage and implement cleanup policies.

Topics: Retention policies, storage management, cleanup strategies, disk usage

Use text, images, and files in your agent prompts effectively.

Topics: Image prompts, file attachments, mixed content, format handling

Getting Started - Tutorials for learning vibe-check fundamentals
API Reference - Complete API documentation
Explanation - Architecture and design deep dives

How-To Guides

Quick Navigation

Testing

Automation

Evaluation

Advanced

Related Documentation