Eval
Evaluation & Validation
Validates outputs against acceptance criteria, runs benchmark tests, scores quality, and provides data-driven feedback on agent performance.
Model: opus
Runtime: claude-code
MCP Toolkit
- • acceptance-testing
- • benchmarking
- • quality-scoring
- • performance-metrics