Skip to main content
To evaluate a product version against a predefined Test, you can loop through its Test Cases and run an evaluation for each one. This workflow is ideal for regression testing and standardized quality checks.
A session is created implicitly the first time you run an evaluation for a specific version_id and test_id combination. You do not need to create it manually.

Workflow

1

Select a Test and Version

Identify the test_id and version_id you want to evaluate.
2

Iterate Through Test Cases

Fetch all test cases associated with the test using galtea.test_cases.list().
3

Generate and Evaluate Output

For each test case, call your product to get its output, then use galtea.evaluations.create_single_turn() to create the evaluations.

Example

This example demonstrates how to run an evaluation on all test cases from a specific test.
# 1. Fetch the test cases for the specified test
test_cases = galtea.test_cases.list(test_id=test_id)
print(f"Found {len(test_cases)} test cases for Test ID: {test_id}")

# 2. Loop through each test case and run an evaluation
for test_case in test_cases:
    # Get your product's actual response to the input
    actual_output = your_product_function(test_case.input)

    # Create the evaluation
    # An evaluation is created implicitly on the first call
    galtea.evaluations.create_single_turn(
        version_id=version_id,
        test_case_id=test_case.id,
        metrics=METRICS_TO_EVALUATE,
        actual_output=actual_output,
    )

print(f"\nAll evaluations submitted for Version ID: {version_id}")
A session is automatically created behind the scenes to link this version_id and test_id with the provided inference result (the actual_output and the Test Case’s input).

Specification-Based Evaluation

Instead of passing metrics explicitly, you can evaluate against Specifications. Each specification has linked metrics that the API resolves automatically.
# 1. List the specifications for your product
specifications = galtea.specifications.list(product_id=product_id)
spec_ids = [spec.id for spec in specifications]
print(f"Found {len(spec_ids)} specifications for product {product_id}")

# 2. Create a session and record an inference result
session = galtea.sessions.create(version_id=version_id, test_case_id=test_cases[0].id)
inference_result = galtea.inference_results.create(
    session_id=session.id,
    input=test_cases[0].input,
    output=your_product_function(test_cases[0].input),
)

# 3. Evaluate using specifications — the API resolves linked metrics automatically
evaluations = galtea.evaluations.create(
    inference_result_id=inference_result.id,
    specification_ids=spec_ids,
)
print(f"Created {len(evaluations)} evaluations from specifications")
If you omit both metrics and specification_ids, the API falls back to all metrics from every specification linked to the product.