Create evaluation tasks for all inference results within a session using specified metrics.
CustomScoreEvaluationMetric
objects. Attempting to use a custom metric will result in a ValueError
. For custom scoring, please use the create_single_turn
method for each turn of the conversation.