To evaluate a product version against a predefined Test, you can loop through its Test Cases and run an evaluation for each one. This workflow is ideal for regression testing and standardized quality checks.
A session is created implicitly the first time you run an evaluation for a specific version_id and test_id combination. You do not need to create it manually.
Workflow
Select a Test and Version
Identify the test_id and version_id you want to evaluate.
Iterate Through Test Cases
Fetch all test cases associated with the test using galtea.test_cases.list().
Generate and Evaluate Output
For each test case, call your product to get its output, then use galtea.evaluations.create_single_turn() to create the evaluations.
Example
This example demonstrates how to run an evaluation on all test cases from a specific test.
# 1. Fetch the test cases for the specified test
test_cases = galtea.test_cases.list(test_id=test_id)
print(f"Found {len(test_cases)} test cases for Test ID: {test_id}")
# 2. Loop through each test case and run an evaluation
for test_case in test_cases:
# Get your product's actual response to the input
actual_output = your_product_function(test_case.input)
# Create the evaluation
# An evaluation is created implicitly on the first call
galtea.evaluations.create_single_turn(
version_id=version_id,
test_case_id=test_case.id,
metrics=METRICS_TO_EVALUATE,
actual_output=actual_output,
)
print(f"\nAll evaluations submitted for Version ID: {version_id}")
A session is automatically created behind the scenes to link this version_id and test_id with the provided inference result (the actual_output and the Test Case’s input).
Specification-Based Evaluation
Instead of passing metrics explicitly, you can evaluate against Specifications. Each specification has linked metrics that the API resolves automatically.
# 1. List the specifications for your product
specifications = galtea.specifications.list(product_id=product_id)
spec_ids = [spec.id for spec in specifications]
print(f"Found {len(spec_ids)} specifications for product {product_id}")
# 2. Create a session and record an inference result
session = galtea.sessions.create(version_id=version_id, test_case_id=test_cases[0].id)
inference_result = galtea.inference_results.create(
session_id=session.id,
input=test_cases[0].input,
output=your_product_function(test_cases[0].input),
)
# 3. Evaluate using specifications — the API resolves linked metrics automatically
evaluations = galtea.evaluations.create(
inference_result_id=inference_result.id,
specification_ids=spec_ids,
)
print(f"Created {len(evaluations)} evaluations from specifications")
If you omit both metrics and specification_ids, the API falls back to all metrics from every specification linked to the product.