An Evaluation is created implicitly the first time you run an evaluation task for a specific
version_id
and test_id
combination. You do not need to create it manually.Workflow
1
Select a Test and Version
Identify the
test_id
and version_id
you want to evaluate.2
Iterate Through Test Cases
Fetch all test cases associated with the test using
galtea.test_cases.list()
.3
Generate and Evaluate Output
For each test case, call your product to get its output, then use
galtea.evaluation_tasks.create_single_turn()
to create the evaluation tasks.Example
This example demonstrates how to run an evaluation on all test cases from a specific test.A session and evaluation is automatically created behind the scenes to link this
version_id
and test_id
with the provided inference result (the actual_output
and the Test Case’s input).