galtea.evaluations.create()
before creating evaluation tasks.
evaluation_tasks.create()
Repurposed: The old evaluation_tasks.create()
method has been replaced by the new create_single_turn()
method for test-based evaluations. The name create()
is now used exclusively for the new session-based workflow to evaluate all turns in a conversation.
create_single_turn()
: Use this method for single-turn, test-case-based evaluations. It now requires version_id
instead of evaluation_id
.
galtea.versions.create()
method now accepts all properties as direct keyword arguments, removing the need for the optional_props
dictionary.
galtea.sessions.create()
method allows you to create sessions to group multiple inference results (conversation turns) under a single session, making it easier to track multi-turn interactions.
input
parameter and set is_production=True
. If you want to evaluate multi-turn conversations, refer to the new session-based workflow in Part 2.create_single_turn()
without a test case:
galtea.versions.create()
calls, remove the optional_props
dictionary and pass its contents as direct keyword arguments.galtea.evaluations.create()
.galtea.evaluation_tasks.create()
calls to galtea.evaluation_tasks.create_single_turn()
, remove the evaluation_id
parameter, and add the version_id
parameter.optional_props
dictionary is fully deprecated in v2.0, and some examples in the codebase may not yet reflect the v2.0 changes.galtea.sessions.create()
.input
and output
. You log these using galtea.inference_results.create()
.evaluation_tasks.create()
: A new way to run evaluations on all inference results within a given session. This allows for easy batch evaluation of entire conversations.create_batch()
instead of calling create()
in a loop. This reduces network overhead and improves response times.test_case_id
parameter when creating sessions and adding is_production=True
.