Testing Guides
Learn testing patterns including reactive watchers, cumulative state, custom matchers, and matrix testing.
These guides provide practical solutions to specific problems. Each guide is task-oriented and focuses on achieving a particular goal.
Testing Guides
Learn testing patterns including reactive watchers, cumulative state, custom matchers, and matrix testing.
Automation Guides
Build robust workflows with multi-stage pipelines, loop patterns, and error handling.
Evaluation Guides
Evaluate agent quality using LLM judges, rubrics, and benchmarking strategies.
Advanced Guides
Master advanced features like MCP servers, cost optimization, bundle cleanup, and multi-modal prompts.
Learn patterns for writing effective tests and assertions.
Implement fail-fast assertions with AgentExecution.watch()
to catch issues early during agent runs.
Topics: Real-time monitoring, partial results, early termination, error detection
Track and aggregate state across multiple agent runs for comprehensive testing scenarios.
Topics: Multi-run tracking, state aggregation, cross-run analysis, data persistence
Use all available matchers for files, tools, quality checks, and cost constraints.
Topics: File matchers, tool matchers, quality matchers, cost matchers, matcher chaining
Generate Cartesian product tests to benchmark multiple models and configurations.
Topics: Test generation, model comparison, configuration matrices, performance analysis
Build production-ready agent workflows and pipelines.
Create multi-stage workflows that orchestrate complex agent interactions.
Topics: Stage definitions, cumulative context, cross-stage data sharing, pipeline composition
Implement retry logic and iterative workflows using until()
helpers.
Topics: Retry strategies, convergence testing, iterative refinement, condition checking
Build resilient workflows with comprehensive error handling strategies.
Topics: Error recovery, graceful degradation, fallback strategies, error reporting
Evaluate and benchmark agent quality systematically.
Leverage LLM-based evaluation to assess agent output quality.
Topics: Judge configuration, rubric application, scoring systems, quality gates
Design effective rubrics for consistent and reliable evaluation.
Topics: Rubric structure, criterion design, scoring scales, best practices
Compare models, configurations, and prompts with systematic benchmarking.
Topics: Performance metrics, cost analysis, model comparison, regression detection
Master advanced features and optimization techniques.
Integrate Model Context Protocol servers for enhanced agent capabilities.
Topics: MCP configuration, server integration, tool availability, context management
Reduce costs while maintaining quality through strategic optimizations.
Topics: Token reduction, model selection, prompt optimization, caching strategies
Manage artifact storage and implement cleanup policies.
Topics: Retention policies, storage management, cleanup strategies, disk usage
Use text, images, and files in your agent prompts effectively.
Topics: Image prompts, file attachments, mixed content, format handling