> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Test-Based Evaluations

> Learn how to run evaluations for a single-turn, test-based workflow.

<Note>
  **Using specifications?** If your tests and metrics are linked to [Specifications](/concepts/product/specification), you can use [Specification-Based Evaluation](#specification-based-evaluation) below — it resolves metrics automatically. This tutorial also covers the manual loop approach for full control over each test case.
</Note>

To evaluate a product version against a predefined [Test](/concepts/product/test), you can loop through its [Test Cases](/concepts/product/test/case) and run an evaluation for each one. This workflow is ideal for regression testing and standardized quality checks.

<Info>
  A [session](/concepts/product/version/session) is created **implicitly** the first time you run an evaluation for a specific `version_id` and `test_id` combination. You do not need to create it manually.
</Info>

### Workflow

<Steps>
  <Step title="Select a Test and Version">
    Identify the `test_id` and `version_id` you want to evaluate.
  </Step>

  <Step title="Iterate Through Test Cases">
    Fetch all test cases associated with the test using `galtea.test_cases.list()`.
  </Step>

  <Step title="Generate and Evaluate Output">
    For each test case, call your product to get its output, create a session with `galtea.sessions.create()`, then use `galtea.inference_results.create_and_evaluate()` to log and evaluate the result.
  </Step>
</Steps>

### Example

This example demonstrates how to run an evaluation on all test cases from a specific test.

```python theme={"system"}
# 1. Fetch the test cases for the specified test
test_cases = galtea.test_cases.list(test_id=test_id)
print(f"Found {len(test_cases)} test cases for Test ID: {test_id}")

# 2. Loop through each test case and run an evaluation
for test_case in test_cases:
    # Get your product's actual response to the input
    actual_output = your_product_function(test_case.input)

    # Create a session and evaluate
    session = galtea.sessions.create(version_id=version_id, test_case_id=test_case.id)
    galtea.inference_results.create_and_evaluate(
        session_id=session.id,
        output=actual_output,
        metrics=METRICS_TO_EVALUATE,
    )

print(f"\nAll evaluations submitted for Version ID: {version_id}")
```

<Info>
  A [session](/concepts/product/version/session) is automatically created behind the scenes to link this `version_id` and `test_id` with the provided [inference result](/concepts/product/version/session/inference-result) (the `actual_output` and the [Test Case](/concepts/product/test/case)'s input).
</Info>

### Specification-Based Evaluation

Instead of passing metrics explicitly, you can evaluate against [Specifications](/concepts/product/specification). Each specification has linked metrics that the API resolves automatically.

```python theme={"system"}
# 1. List the specifications for your product
specifications = galtea.specifications.list(product_id=product_id)
spec_ids = [spec.id for spec in specifications]
print(f"Found {len(spec_ids)} specifications for product {product_id}")

# 2. Create a session and record an inference result
session = galtea.sessions.create(version_id=version_id, test_case_id=test_cases[0].id)
inference_result = galtea.inference_results.create(
    session_id=session.id,
    input=test_cases[0].input,
    output=your_product_function(test_cases[0].input),
)

# 3. Evaluate using specifications — the API resolves linked metrics automatically
evaluations = galtea.evaluations.create(
    inference_result_id=inference_result.id,
    specification_ids=spec_ids,
)
print(f"Created {len(evaluations)} evaluations from specifications")
```

<Info>
  If you omit both `metrics` and `specification_ids`, the API falls back to all metrics from every specification linked to the product.
</Info>

## Next Steps

<CardGroup cols={2}>
  <Card title="Specification-Driven Evaluations" icon="bullseye" href="/sdk/tutorials/specification-driven-evaluations">
    Automate test resolution, agent execution, and evaluation with specifications.
  </Card>

  <Card title="Evaluating Conversations" icon="comments" href="/sdk/tutorials/evaluating-conversations">
    Evaluate multi-turn conversations and production sessions.
  </Card>
</CardGroup>
