This method evaluates an entire conversation stored in a Session by creating evaluation tasks for each of its Inference Results.

Returns

Returns a list of EvaluationTask objects, one for each metric and each inference result in the session.

Example

# First, create a session and log inference results
session = galtea.sessions.create(version_id="YOUR_VERSION_ID")
galtea.inference_results.create_batch(
    session_id=session.id,
    conversation_turns=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "How are you?"},
        {"role": "assistant", "content": "I am fine, thank you."}
    ]
)

# Now, evaluate the entire session
evaluation_tasks = galtea.evaluation_tasks.create(
    session_id=session.id,
    metrics=["Role Adherence", "Conversation Relevancy"]
)
This method does not support CustomScoreEvaluationMetric objects. Attempting to use a custom metric will result in a ValueError. For custom scoring, please use the create_single_turn method for each turn of the conversation.

Parameters

session_id
string
required
The ID of the session containing the inference results to be evaluated.
metrics
List[string]
required
A list of standard metrics to use. It is not possible to use metrics with custom scores with this method.