Migrating to Galtea SDK v3.0
Welcome to Galtea SDK v3.0! This version introduces major improvements, including a new session-based evaluation workflow and simplifications to the existing test-based workflow. This guide will walk you through the necessary changes to update your SDK v2.x integrations and introduce you to the new features available in v3.0.Part 1: Migrating Your Existing Workflow (Test-Case-Based Evaluation)
The core workflow for running predefined tests against a version has been streamlined.Key Changes for Test-Based Evaluations
-
New Terminology: The old
EvaluationTask
entity is now simply calledEvaluation
. The concept of a parentEvaluation
container has been removed to simplify the workflow. -
Simplified Workflow: The previous two-step process of creating an
Evaluation
container and then addingEvaluationTask
s to it is now a single method call. -
New Method
create_single_turn()
: The old methods for creating tasks (e.g.,galtea.evaluation_tasks.create()
) are replaced bygaltea.evaluations.create_single_turn()
. This new method directly creates evaluations without needing a pre-existing container. -
Updated Parameters: The
create_single_turn()
method now requires aversion_id
instead of anevaluation_id
, as the direct link is now to the version being tested. -
Simplified Version Creation: The
galtea.versions.create()
method now accepts all properties as direct keyword arguments, removing the need for theoptional_props
dictionary. -
Added Sessions: The new
galtea.sessions.create()
method allows you to create sessions to group multiple inference results (conversation turns) under a single session, making it easier to track multi-turn interactions.
Migration Diff: Test-Case Workflow
Here’s a side-by-side comparison of a typical v2 script and its direct v3 equivalent.❌ SDK v2.x (Old Way)
✅ SDK v3.0 (New Way)
This example demonstrates how to create single-turn evaluations for test cases. For production data logging, include the
input
parameter and set is_production=True
. If you want to evaluate multi-turn conversations, refer to the new session-based workflow in Part 2.Production Data Logging with Single-Turn Evaluations
For production monitoring, you can also usecreate_single_turn()
without a test case:
Summary of Actions for Migration
- In
galtea.versions.create()
calls, remove theoptional_props
dictionary and pass its contents as direct keyword arguments. - Remove all calls to
galtea.evaluations.create()
that were used to create evaluation containers. - Rename
galtea.evaluation_tasks.create()
calls togaltea.evaluations.create_single_turn()
, remove theevaluation_id
parameter, and add theversion_id
parameter.
Ensure all your code, including any examples you may be referencing from the SDK repository, is updated. The
optional_props
dictionary is fully deprecated in v3.0, and some examples in the codebase may not yet reflect the v3.0 changes.Part 2: What’s New in v3.0 - The Session-Based Workflow
SDK v3.0 introduces a powerful new way to log and evaluate multi-turn conversations through Sessions and Inference Results. This approach is ideal for monitoring production traffic or evaluating complex, interactive scenarios.New Concepts
- Session: A container for a sequence of interactions (a conversation) between a user and your AI product. You can create a session using
galtea.sessions.create()
. - Inference Result: A single turn within a session, containing the
input
andoutput
. You log these usinggaltea.inference_results.create()
. evaluations.create()
: A new way to run evaluations on all inference results within a given session. This allows for easy batch evaluation of entire conversations.
Example of the New Session-Based Workflow
This workflow is entirely new and does not directly replace the test-based workflow.For better performance with multiple conversation turns, always use
create_batch()
instead of calling create()
in a loop. This reduces network overhead and improves response times.Benefits of the New Workflow
- Track Full Conversations: Accurately log and analyze multi-turn user interactions.
- Production Monitoring: Easily send production data to Galtea for continuous evaluation by removing the
test_case_id
parameter when creating sessions and addingis_production=True
. - Batch Evaluation: Evaluate an entire conversation with a single command.