Migrating to Galtea SDK v2.0
Welcome to Galtea SDK v2.0! This version introduces major improvements, including a new session-based evaluation workflow and simplifications to the existing test-based workflow. This guide will walk you through the necessary changes to update your SDK v1.x integrations and introduce you to the new features available in v2.0.Part 1: Migrating Your Existing Workflow (Test-Case-Based Evaluation)
The core workflow for running predefined tests against a version has been streamlined.Key Changes for Test-Based Evaluations
-
Implicit Evaluation Creation: You no longer need to explicitly call
galtea.evaluations.create()before creating evaluations. -
evaluations.create()Repurposed: The oldevaluations.create()method has been replaced by the newcreate_single_turn()method for test-based evaluations. The namecreate()is now used exclusively for the new session-based workflow to evaluate all turns in a conversation. -
New Method
create_single_turn(): Use this method for single-turn, test-case-based evaluations. It now requiresversion_idinstead ofevaluation_id. -
Simplified Version Creation: The
galtea.versions.create()method now accepts all properties as direct keyword arguments, removing the need for theoptional_propsdictionary. -
Added Sessions: The new
galtea.sessions.create()method allows you to create sessions to group multiple inference results (conversation turns) under a single session, making it easier to track multi-turn interactions.
Migration Diff: Test-Case Workflow
Here’s a side-by-side comparison of a typical v1 script and its direct v2 equivalent.❌ SDK v1.x (Old Way)
✅ SDK v2.0 (New Way)
This example demonstrates how to create single-turn evaluations for test cases. For production data logging, include the
input parameter and set is_production=True. If you want to evaluate multi-turn conversations, refer to the new session-based workflow in Part 2.Production Data Logging with Single-Turn Evaluations
For production monitoring, you can also usecreate_single_turn() without a test case:
Summary of Actions for Migration
- In
galtea.versions.create()calls, remove theoptional_propsdictionary and pass its contents as direct keyword arguments. - Remove all calls to
galtea.evaluations.create(). - Rename
galtea.evaluations.create()calls togaltea.evaluations.create_single_turn(), remove theevaluation_idparameter, and add theversion_idparameter.
Ensure all your code, including any examples you may be referencing from the SDK repository, is updated. The
optional_props dictionary is fully deprecated in v2.0, and some examples in the codebase may not yet reflect the v2.0 changes.Part 2: What’s New in v2.0 - The Session-Based Workflow
SDK v2.0 introduces a powerful new way to log and evaluate multi-turn conversations through Sessions and Inference Results. This approach is ideal for monitoring production traffic or evaluating complex, interactive scenarios.New Concepts
- Session: A container for a sequence of interactions (a conversation) between a user and your AI product. You can create a session using
galtea.sessions.create(). - Inference Result: A single turn within a session, containing the
inputandoutput. You log these usinggaltea.inference_results.create(). evaluations.create(): A new way to run evaluations on all inference results within a given session. This allows for easy batch evaluation of entire conversations.
Example of the New Session-Based Workflow
This workflow is entirely new and does not directly replace the test-based workflow.For better performance with multiple conversation turns, always use
create_batch() instead of calling create() in a loop. This reduces network overhead and improves response times.Benefits of the New Workflow
- Track Full Conversations: Accurately log and analyze multi-turn user interactions.
- Production Monitoring: Easily send production data to Galtea for continuous evaluation by removing the
test_case_idparameter when creating sessions and addingis_production=True. - Batch Evaluation: Evaluate an entire conversation with a single command.