Learn how to run evaluation tasks for a single-turn, test-based workflow.
version_id
and test_id
combination. You do not need to create it manually.Select a Test and Version
test_id
and version_id
you want to evaluate.Iterate Through Test Cases
galtea.test_cases.list()
.Generate and Evaluate Output
galtea.evaluation_tasks.create_single_turn()
to create the evaluation tasks.version_id
and test_id
with the provided inference result (the actual_output
and the Test Case’s input).