- Deterministic Metrics: When you have custom, rule-based logic to score outputs (e.g., checking for specific keywords, validating JSON structure).
- Integrating External Models: When you use your own models for evaluation and want to store their scores in Galtea.
CustomScoreEvaluationMetric
.
Single-Turn Evaluation with Custom Metrics
For individual test cases or production logs, you can define your metric and pass an instance of it directly to thecreate_single_turn
method.