Skip to main content
To evaluate a product version against a predefined Test, you can loop through its Test Cases and run an evaluation for each one. This workflow is ideal for regression testing and standardized quality checks.
A session is created implicitly the first time you run an evaluation for a specific version_id and test_id combination. You do not need to create it manually.

Workflow

1

Select a Test and Version

Identify the test_id and version_id you want to evaluate.
2

Iterate Through Test Cases

Fetch all test cases associated with the test using galtea.test_cases.list().
3

Generate and Evaluate Output

For each test case, call your product to get its output, then use galtea.evaluations.create_single_turn() to create the evaluations.

Example

This example demonstrates how to run an evaluation on all test cases from a specific test.
from galtea import Galtea
import os

# Initialize Galtea SDK
galtea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))

# --- Configuration ---
# Replace with your actual IDs
YOUR_VERSION_ID = "your_version_id"
YOUR_TEST_ID = "your_test_id"
METRICS_TO_EVALUATE = [
    "Factual Accuracy",
    "Role Adherence",
]

# --- Your Product's Logic (Placeholder) ---
# In a real scenario, this would call your actual AI model or API
def your_product_function(input_prompt: str, context: str = None) -> str:
    # Your model's inference logic goes here
    # For this example, we'll return a simple placeholder response
    return f"Model response to: {input_prompt}"

# 1. Fetch the test cases for the specified test
test_cases = galtea.test_cases.list(test_id=YOUR_TEST_ID)
print(f"Found {len(test_cases)} test cases for Test ID: {YOUR_TEST_ID}")

# 2. Loop through each test case and run an evaluation
for test_case in test_cases:
    # Get your product's actual response to the input
    actual_output = your_product_function(test_case.input)
    
    # Create the evaluation
    # An evaluation is created implicitly on the first call
    galtea.evaluations.create_single_turn(
        version_id=YOUR_VERSION_ID,
        test_case_id=test_case.id,
        metrics=METRICS_TO_EVALUATE,
        actual_output=actual_output,
    )

print(f"\nAll evaluations submitted for Version ID: {YOUR_VERSION_ID}")

A session is automatically created behind the scenes to link this version_id and test_id with the provided inference result (the actual_output and the Test Case’s input).