Using Judges
Leverage LLM-based evaluation to assess agent output quality with rubrics.
Learn how to evaluate agent quality, benchmark performance, and compare models using vibe-check’s evaluation features.
Using Judges
Leverage LLM-based evaluation to assess agent output quality with rubrics.
Writing Rubrics
Design effective rubrics for consistent and reliable quality assessment.
Benchmarking
Compare models and configurations with systematic performance analysis.
Systematic agent evaluation requires structured assessment criteria and reproducible benchmarking. These guides cover:
Leverage LLM-based evaluation to assess agent output quality automatically.
You’ll learn:
Use cases:
Key concepts:
Design effective rubrics for consistent and reliable agent evaluation.
You’ll learn:
Use cases:
Key concepts:
Compare models, configurations, and prompts systematically.
You’ll learn:
Use cases:
Key concepts:
toPassRubric
matcher