To accurately evaluate interactions within a dialogue, you can use Galtea’s session-based workflow. This approach allows you to log an entire conversation and then run evaluations on all of its turns at once.

Certain metrics are specifically designed for conversational analysis and require the full context:

  • Role Adherence: Measures how well the AI stays within its defined role.
  • Knowledge Retention: Assesses the model’s ability to remember and use information from previous turns.
  • Conversation Completeness: Evaluates whether the conversation has reached a natural and informative conclusion.
  • Conversation Relevancy: Assesses whether each turn in the conversation is relevant to the ongoing topic.

The Session-Based Workflow

1

Create a Session

A Session acts as a container for all the turns in a single conversation. You create one at the beginning of an interaction.

2

Log Inference Results

Each user input and model output pair is an Inference Result. You can log these turns individually or in a single batch call after the conversation ends. Using a batch call is more efficient.

3

Evaluate the Session

Once the session is logged, you can create evaluation tasks for the entire conversation using the evaluation_tasks.create() method. This will generate tasks for each turn against the specified metrics.

Example

This example demonstrates logging and evaluating a multi-turn conversation from a test case.

from galtea import Galtea
import os

galtea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))

YOUR_VERSION_ID = "your_version_id"
YOUR_TEST_CASE_ID = "your_test_case_id"
CONVERSATIONAL_METRICS = ["role-adherence", "knowledge-retention"]

# 1. Create a Session linked to a test case
session = galtea.sessions.create(
    version_id=YOUR_VERSION_ID,
    test_case_id=YOUR_TEST_CASE_ID,
)
print(f"Created Session: {session.id}")

# 2. Log the conversation turns.
# In a real scenario, you would dynamically collect these from your product's interaction.
conversation_turns = [
    {"input": "What's your return policy?", "output": "Our return policy allows returns within 30 days."},
    {"input": "What if I lost the receipt?", "output": "A proof of purchase is required for all returns."},
]

# Use create_batch for efficiency
galtea.inference_results.create_batch(
    session_id=session.id,
    conversation_turns=conversation_turns
)

# 3. Evaluate the entire session at once
evaluation_tasks = galtea.evaluation_tasks.create(
    session_id=session.id,
    metrics=CONVERSATIONAL_METRICS
)
print(f"Submitted {len(evaluation_tasks)} evaluation tasks for session {session.id}")

This workflow can also be used for production monitoring by creating a session with is_production=True and omitting the test_case_id. See the Monitor Production Responses guide for an example.

Learn More