Migrating to Galtea SDK v2.0

Welcome to Galtea SDK v2.0! This version introduces major improvements, including a new session-based evaluation workflow and simplifications to the existing test-based workflow.

This guide will walk you through the necessary changes to update your SDK v1.x integrations and introduce you to the new features available in v2.0.

Part 1: Migrating Your Existing Workflow (Test-Case-Based Evaluation)

The core workflow for running predefined tests against a version has been streamlined.

Key Changes for Test-Based Evaluations

  1. Implicit Evaluation Creation: You no longer need to explicitly call galtea.evaluations.create() before creating evaluation tasks.

  2. evaluation_tasks.create() Repurposed: The old evaluation_tasks.create() method has been replaced by the new create_single_turn() method for test-based evaluations. The name create() is now used exclusively for the new session-based workflow to evaluate all turns in a conversation.

  3. New Method create_single_turn(): Use this method for single-turn, test-case-based evaluations. It now requires version_id instead of evaluation_id.

  4. Simplified Version Creation: The galtea.versions.create() method now accepts all properties as direct keyword arguments, removing the need for the optional_props dictionary.

  5. Added Sessions: The new galtea.sessions.create() method allows you to create sessions to group multiple inference results (conversation turns) under a single session, making it easier to track multi-turn interactions.

Migration Diff: Test-Case Workflow

Here’s a side-by-side comparison of a typical v1 script and its direct v2 equivalent.

❌ SDK v1.x (Old Way)

from galtea import Galtea
galtea = Galtea(api_key="...")

# 1. Create version with optional_props dictionary
version = galtea.versions.create(
    name="v1.0-summarizer",
    product_id="YOUR_PRODUCT_ID",
    optional_props={
      "description": "Old version using gpt-3.5",
      "model_id": "gpt-3.5-turbo"
    }
)

# 2. Explicitly create an Evaluation
evaluation = galtea.evaluations.create(
    test_id="YOUR_TEST_ID",
    version_id=version.id
)

# 3. Create tasks using evaluation_id
galtea.evaluation_tasks.create(
    evaluation_id=evaluation.id,
    test_case_id="YOUR_TEST_CASE_ID",
    metrics=["your_metric"],
    actual_output="This is the model's output."
)

✅ SDK v2.0 (New Way)

from galtea import Galtea
galtea = Galtea(api_key="...")

# 1. Create version with direct keyword arguments
version = galtea.versions.create(
    name="v2.0-summarizer",
    product_id="YOUR_PRODUCT_ID",
    description="New version using gpt-4",
    model_id="gpt-4"
)

# 2. Skip explicit evaluation creation (it's automatic!)

# 3. Create tasks using version_id
galtea.evaluation_tasks.create_single_turn(
    version_id=version.id,
    test_case_id="YOUR_TEST_CASE_ID",
    metrics=["your_metric"],
    actual_output="This is the model's output."
)

This example demonstrates how to create single-turn evaluation tasks for test cases. For production data logging, include the input parameter and set is_production=True. If you want to evaluate multi-turn conversations, refer to the new session-based workflow in Part 2.

Production Data Logging with Single-Turn Tasks

For production monitoring, you can also use create_single_turn() without a test case:

# Production data logging (no test case required)
galtea.evaluation_tasks.create_single_turn(
    version_id=version.id,
    input="User's actual input",
    actual_output="Model's actual output",
    metrics=["your_metric"],
    is_production=True
)

Summary of Actions for Migration

  1. In galtea.versions.create() calls, remove the optional_props dictionary and pass its contents as direct keyword arguments.
  2. Remove all calls to galtea.evaluations.create().
  3. Rename galtea.evaluation_tasks.create() calls to galtea.evaluation_tasks.create_single_turn(), remove the evaluation_id parameter, and add the version_id parameter.

Ensure all your code, including any examples you may be referencing from the SDK repository, is updated. The optional_props dictionary is fully deprecated in v2.0, and some examples in the codebase may not yet reflect the v2.0 changes.


Part 2: What’s New in v2.0 - The Session-Based Workflow

SDK v2.0 introduces a powerful new way to log and evaluate multi-turn conversations through Sessions and Inference Results. This approach is ideal for monitoring production traffic or evaluating complex, interactive scenarios.

New Concepts

  • Session: A container for a sequence of interactions (a conversation) between a user and your AI product. You can create a session using galtea.sessions.create().
  • Inference Result: A single turn within a session, containing the input and output. You log these using galtea.inference_results.create().
  • evaluation_tasks.create(): A new way to run evaluations on all inference results within a given session. This allows for easy batch evaluation of entire conversations.

Example of the New Session-Based Workflow

This workflow is entirely new and does not directly replace the test-based workflow.

# SDK v2.0 New Workflow Example
from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

YOUR_PRODUCT_ID = "your_product_id"
YOUR_VERSION_ID = "your_version_id"
YOUR_TEST_CASE_ID = "your_test_case_id"
YOUR_METRIC_NAME = "your_metric_name"

# 1. Create a Session to group a conversation
session = galtea.sessions.create(
    version_id=YOUR_VERSION_ID,
    test_case_id=YOUR_TEST_CASE_ID,
)
print(f"Created Session: {session.id}")

# 2. Log all inference results in a single efficient batch call
conversation_turns = [
    {
        "input": "What's your return policy?",
        "output": "Our return policy allows returns within 30 days of purchase."
    },
    {
        "input": "What if I lost the receipt?",
        "output": "A proof of purchase is required for all returns."
    },
    {
        "input": "Can I return items bought online to a physical store?",
        "output": "Yes, you can return online purchases to any of our physical store locations."
    },
    {
        "input": "What about exchanges?",
        "output": "Exchanges follow the same 30-day policy and can be done in-store or online."
    }
]

# Use create_batch for efficient bulk processing
galtea.inference_results.create_batch(
    session_id=session.id,
    conversation_turns=conversation_turns
)

print(f"Stored {len(conversation_turns)} inference results in session {session.id} with a single call")

# Or loop through conversation turns
# for turn in conversation_turns:
#     galtea.inference_results.create(
#         session_id=session.id,
#         input=turn["input"],
#         output=turn["output"]
#     )

# 3. When ready, create evaluation tasks for the entire session
evaluation_tasks = galtea.evaluation_tasks.create(
    session_id=session.id,
    metrics=[YOUR_METRIC_NAME]
)
print(f"Submitted {len(evaluation_tasks)} evaluation tasks for session {session.id}")

For better performance with multiple conversation turns, always use create_batch() instead of calling create() in a loop. This reduces network overhead and improves response times.

Benefits of the New Workflow

  • Track Full Conversations: Accurately log and analyze multi-turn user interactions.
  • Production Monitoring: Easily send production data to Galtea for continuous evaluation by removing the test_case_id parameter when creating sessions and adding is_production=True.
  • Batch Evaluation: Evaluate an entire conversation with a single command.

If you have any questions or encounter any issues during migration, please don’t hesitate to reach out to our support team at support@galtea.ai