Create evaluation tasks for all inference results within a session using specified metrics.
This method evaluates an entire conversation stored in a Session by creating evaluation tasks for each of its Inference Results.
Returns a list of EvaluationTask objects, one for each metric and each inference result in the session.
The ID of the session containing the inference results to be evaluated.
A list of metric type names to use for the evaluation. Tasks will be created for each metric against each inference result.
A list of pre-computed scores corresponding to the metrics
list. Use None
for metrics that Galtea should evaluate. This is useful for providing scores from custom, deterministic metrics.