Migration to v2
Migrating to Galtea SDK v2.0
Welcome to Galtea SDK v2.0! This version introduces major improvements, including a new session-based evaluation workflow and simplifications to the existing test-based workflow.
This guide will walk you through the necessary changes to update your SDK v1.x integrations and introduce you to the new features available in v2.0.
Part 1: Migrating Your Existing Workflow (Test-Case-Based Evaluation)
The core workflow for running predefined tests against a version has been streamlined.
Key Changes for Test-Based Evaluations
-
Implicit Evaluation Creation: You no longer need to explicitly call
galtea.evaluations.create()
before creating evaluation tasks. -
evaluation_tasks.create()
Repurposed: The oldevaluation_tasks.create()
method has been replaced by the newcreate_single_turn()
method for test-based evaluations. The namecreate()
is now used exclusively for the new session-based workflow to evaluate all turns in a conversation. -
New Method
create_single_turn()
: Use this method for single-turn, test-case-based evaluations. It now requiresversion_id
instead ofevaluation_id
. -
Simplified Version Creation: The
galtea.versions.create()
method now accepts all properties as direct keyword arguments, removing the need for theoptional_props
dictionary. -
Added Sessions: The new
galtea.sessions.create()
method allows you to create sessions to group multiple inference results (conversation turns) under a single session, making it easier to track multi-turn interactions.
Migration Diff: Test-Case Workflow
Here’s a side-by-side comparison of a typical v1 script and its direct v2 equivalent.
❌ SDK v1.x (Old Way)
✅ SDK v2.0 (New Way)
This example demonstrates how to create single-turn evaluation tasks for test cases. For production data logging, include the input
parameter and set is_production=True
. If you want to evaluate multi-turn conversations, refer to the new session-based workflow in Part 2.
Production Data Logging with Single-Turn Tasks
For production monitoring, you can also use create_single_turn()
without a test case:
Summary of Actions for Migration
- In
galtea.versions.create()
calls, remove theoptional_props
dictionary and pass its contents as direct keyword arguments. - Remove all calls to
galtea.evaluations.create()
. - Rename
galtea.evaluation_tasks.create()
calls togaltea.evaluation_tasks.create_single_turn()
, remove theevaluation_id
parameter, and add theversion_id
parameter.
Ensure all your code, including any examples you may be referencing from the SDK repository, is updated. The optional_props
dictionary is fully deprecated in v2.0, and some examples in the codebase may not yet reflect the v2.0 changes.
Part 2: What’s New in v2.0 - The Session-Based Workflow
SDK v2.0 introduces a powerful new way to log and evaluate multi-turn conversations through Sessions and Inference Results. This approach is ideal for monitoring production traffic or evaluating complex, interactive scenarios.
New Concepts
- Session: A container for a sequence of interactions (a conversation) between a user and your AI product. You can create a session using
galtea.sessions.create()
. - Inference Result: A single turn within a session, containing the
input
andoutput
. You log these usinggaltea.inference_results.create()
. evaluation_tasks.create()
: A new way to run evaluations on all inference results within a given session. This allows for easy batch evaluation of entire conversations.
Example of the New Session-Based Workflow
This workflow is entirely new and does not directly replace the test-based workflow.
For better performance with multiple conversation turns, always use create_batch()
instead of calling create()
in a loop. This reduces network overhead and improves response times.
Benefits of the New Workflow
- Track Full Conversations: Accurately log and analyze multi-turn user interactions.
- Production Monitoring: Easily send production data to Galtea for continuous evaluation by removing the
test_case_id
parameter when creating sessions and addingis_production=True
. - Batch Evaluation: Evaluate an entire conversation with a single command.
If you have any questions or encounter any issues during migration, please don’t hesitate to reach out to our support team at support@galtea.ai