Skip to main content
This guide will walk you through the steps to begin evaluating and monitoring the reliability of your AI products with Galtea as quickly as possible.

Evaluation Workflow Overview

1

Create a Product

Define what functionality or service you want to evaluate on the Galtea platform.
2

Install SDK & Connect

Set up the Galtea Python SDK to interact with the platform.
3

Register a Version

Document a specific implementation of your product using the SDK.
4

Select a Test

Use a default Galtea test (or create your own) to challenge your product.
5

Select a Metric

Use a default Galtea metric (or define your own) to evaluate your product’s version.
6

Run Evaluations

Test your product version with the selected test and metrics, then analyze results.

1. Create a Product

Create a product in the Galtea dashboard. Navigate to Products > Create New Product and complete the form.
The product description is important as it may be used to generate synthetic test data.

2. Install the SDK and Connect

1

Get your API key

In the Galtea dashboard, navigate to Settings > Generate API Key and copy your key.
2

Install the SDK

pip install galtea
3

Connect to the platform

from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

products = galtea.products.list() 
print(f"Found {len(products)} products.")
# Choose a product ID for the next steps
YOUR_PRODUCT_ID = products[0].id if products else "your_product_id"

3. Create a Version

Create a version to track a specific implementation of your product.
version = galtea.versions.create(
    name="v0.1-quickstart",
    product_id=YOUR_PRODUCT_ID,
    description="Initial version for quickstart evaluation"
)
print(f"Created Version with ID: {version.id}")

4. Use a Default Test

For this quickstart, we’ll use the default “Jailbreak” test, which is a type of Red Teaming Test.
test = galtea.tests.get_by_name(product_id=YOUR_PRODUCT_ID, test_name="Jailbreak")
test_cases = galtea.test_cases.list(test_id=test.id)
print(f"Using test '{test.name}' with {len(test_cases)} test cases.")

5. Use a Default Metric

To evaluate the “Jailbreak” test, we’ll use the “Jailbreak Resilience” metric.
metric = galtea.metrics.get_by_name(name="Jailbreak Resilience")

6. Run Evaluations

Now, run evaluations against your test cases.
In a real scenario, your_product_function would be a call to your actual AI model.
# Placeholder for your actual product/model inference function
def your_product_function(input_prompt):
    if "ignore" in input_prompt.lower():
        return "I am programmed to follow safety guidelines and cannot fulfill this request."
    return f"Of course! I will now {input_prompt}"

# An evaluation is created implicitly with the first evaluation.
# Loop through test cases and create evaluations.
for test_case in test_cases:
    actual_output = your_product_function(test_case.input)
    
    galtea.evaluations.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[metric.name],
        actual_output=actual_output
    )

print(f"Submitted evaluations for version {version.id} using test '{test.name}'.")

7. View Results

You can view results on the Galtea dashboard. Navigate to your product’s “Analytics” tab to see detailed analysis and compare versions.
print(f"View results at: https://platform.galtea.ai/product/{YOUR_PRODUCT_ID}?tab=1")

Next Steps

Congratulations! You’ve completed your first evaluation with Galtea using default assets. This is just the beginning. Explore these concepts to tailor Galtea to your specific needs: If you have any questions or need assistance, contact us at support@galtea.ai.
I