Run a Test-Based Evaluation
Learn how to run evaluation tasks for a single-turn, test-based workflow.
To evaluate a product version against a predefined Test, you can loop through its Test Cases and run an evaluation task for each one. This workflow is ideal for regression testing and standardized quality checks.
An Evaluation is created implicitly the first time you run an evaluation task for a specific version_id
and test_id
combination. You do not need to create it manually.
Workflow
Select a Test and Version
Identify the test_id
and version_id
you want to evaluate.
Iterate Through Test Cases
Fetch all test cases associated with the test using galtea.test_cases.list()
.
Generate and Evaluate Output
For each test case, call your product to get its output, then use galtea.evaluation_tasks.create_single_turn()
to create the evaluation tasks.
Example
This example demonstrates how to run an evaluation on all test cases from a specific test.
A session and evaluation is automatically created behind the scenes to link this version_id
and test_id
with the provided inference result (the actual_output
and the Test Case’s input).