To accurately evaluate interactions within a dialogue, you can use Galtea’s session-based workflow. This approach allows you to log an entire conversation and then run evaluations on all of its turns at once. Certain metrics are specifically designed for conversational analysis and require the full context:
  • Role Adherence: Measures how well the AI stays within its defined role.
  • Knowledge Retention: Assesses the model’s ability to remember and use information from previous turns.
  • Conversation Completeness: Evaluates whether the conversation has reached a natural and informative conclusion.
  • Conversation Relevancy: Assesses whether each turn in the conversation is relevant to the ongoing topic.

The Session-Based Workflow

1

Create a Session

A Session acts as a container for all the turns in a single conversation. You create one at the beginning of an interaction.
2

Log Inference Results

Each user input and model output pair is an Inference Result. You can log these turns individually or in a single batch call after the conversation ends. Using a batch call is more efficient.
3

Evaluate the Session

Once the session is logged, you can create evaluation tasks for the entire conversation using the evaluation_tasks.create() method.

Choose your scenario

Use this when you have test cases. It requires test_case_id and is often combined with the Conversation Simulator to generate turns.
import os
import galtea
from galtea import Galtea

galtea_client = Galtea(api_key=os.getenv("GALTEA_API_KEY"))

# 1) Fetch your test cases (created from a CSV of scenarios)
test_cases = galtea_client.test_cases.list(test_id="YOUR_TEST_ID")

# 2) Implement your Agent (connect your product/model)
class MyAgent(galtea.Agent):
    def call(self, input_data: galtea.AgentInput) -> galtea.AgentResponse:
        return galtea.AgentResponse(content="...")

for test_case in test_cases:
    # 3) Create a session linked to the test case
    session = galtea_client.sessions.create(
        version_id="YOUR_VERSION_ID",
        test_case_id=test_case.id
    )

    # 4) Run the simulator (synthetic user) with your Agent
    galtea_client.simulator.simulate(
        session_id=session.id,
        agent=MyAgent(),
        max_turns=test_case.max_iterations or 20,
    )

    # 5) Evaluate the full conversation
    galtea_client.evaluation_tasks.create(
        session_id=session.id,
        metrics=["Conversation Relevancy", "Role Adherence", "Knowledge Retention"],
    )
See the full workflow in Simulating Conversations.

Learn More