- An entire conversation stored in a Session by creating evaluations for each of its Inference Results
- A single Inference Result by providing its ID
Returns
Returns a list of Evaluation objects, one for each metric provided.Usage
This method evaluates inference results using the specified metrics. It supports both Galtea-hosted metrics and self-hosted custom metrics.You must provide either
session_id or inference_result_id, but not both. For single-turn evaluations, you can also use galtea.inference_results.create_and_evaluate() which combines creating an inference result and evaluating it in one call.Development Testing
For non self-hosted metricsEvaluating a Single Inference Result
You can evaluate a specific inference result by providing its ID instead of a session ID:Production Monitoring
In order to monitor your product in a production environment, you can create evaluations not linked to a specific test case, but you need to set theis_production flag of the Session to True.
Advanced Usage
You can also create evaluations using self-hosted metrics with dynamically calculated scores by utilizing theCustomScoreEvaluationMetric class, which allows for more complex evaluation scenarios.
Both options are equally valid for self-hosted metrics. Choose based on your preference: pre-compute for simplicity, or use CustomScoreEvaluationMetric for encapsulation and reusability.
Parameters
The ID of the session containing the inference results to be evaluated.
Either
session_id or inference_result_id must be provided, but not both.The ID of a specific inference result to evaluate.
Either
session_id or inference_result_id must be provided, but not both.A list of metrics to use for the evaluation.The
MetricInput dictionary supports the following keys:id(string, optional): The ID of an existing metricname(string, optional): The name of the metricscore(float | CustomScoreEvaluationMetric, optional): For self-hosted metrics only- If
float: Pre-computed score (0.0 to 1.0). Requiresidornamein the dict. - If
CustomScoreEvaluationMetric: Score will be calculated dynamically. The CustomScoreEvaluationMetric instance must be initialized withnameorid. Do NOT provideidornamein the dict when using this option.
- If
For self-hosted metrics, both score options are equally valid: pre-compute as a float, or use CustomScoreEvaluationMetric for dynamic calculation. Galtea-hosted metrics automatically compute scores and should not include a
score field.