This guide will walk you through the steps to begin evaluating and monitoring the reliability of your AI products with Galtea as quickly as possible.

Evaluation Workflow Overview

1

Create a Product

Define what functionality or service you want to evaluate on the Galtea platform.
2

Install SDK & Connect

Set up the Galtea Python SDK to interact with the platform.
3

Register a Version

Document a specific implementation of your product using the SDK.
4

Select a Test

Use a default Galtea test (or create your own) to challenge your product.
5

Select a Metric

Use a default Galtea metric (or define your own) to evaluate your product’s version.
6

Run Evaluations

Test your product version with the selected test and metrics, then analyze results.

1. Create a Product

Create a product in the Galtea dashboard. Navigate to Products > Create New Product and complete the form.

The product description is important as it may be used to generate synthetic test data.

2. Install the SDK and Connect

1

Get your API key

In the Galtea dashboard, navigate to Settings > Generate API Key and copy your key.

2

Install the SDK

pip install galtea
3

Connect to the platform

from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

products = galtea.products.list() 
print(f"Found {len(products)} products.")
# Choose a product ID for the next steps
YOUR_PRODUCT_ID = products[0].id if products else "your_product_id"

3. Create a Version

Create a version to track a specific implementation of your product.

version = galtea.versions.create(
    name="v0.1-quickstart",
    product_id=YOUR_PRODUCT_ID,
    description="Initial version for quickstart evaluation"
)
print(f"Created Version with ID: {version.id}")

4. Use a Default Test

For this quickstart, we’ll use the default “Jailbreak” test, which is a type of Red Teaming Test.

test = galtea.tests.get_by_name(product_id=YOUR_PRODUCT_ID, test_name="Jailbreak")
test_cases = galtea.test_cases.list(test_id=test.id)
print(f"Using test '{test.name}' with {len(test_cases)} test cases.")

5. Use a Default Metric

To evaluate the “Jailbreak” test, we’ll use the “Jailbreak Resilience” metric.

metric = galtea.metrics.get_by_name(name="Jailbreak Resilience")

6. Run Evaluations

Now, run an evaluation by creating evaluation tasks.

In a real scenario, your_product_function would be a call to your actual AI model.

# Placeholder for your actual product/model inference function
def your_product_function(input_prompt):
    if "ignore" in input_prompt.lower():
        return "I am programmed to follow safety guidelines and cannot fulfill this request."
    return f"Of course! I will now {input_prompt}"

# An evaluation is created implicitly with the first task.
# Loop through test cases and create evaluation tasks.
for test_case in test_cases:
    actual_output = your_product_function(test_case.input)
    
    galtea.evaluation_tasks.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[metric.name],
        actual_output=actual_output
    )

print(f"Submitted evaluation tasks for version {version.id} using test '{test.name}'.")

7. View Results

You can view results on the Galtea dashboard. Navigate to your product’s “Analytics” tab to see detailed analysis and compare versions.

print(f"View results at: https://platform.galtea.ai/product/{YOUR_PRODUCT_ID}?tab=1")

Next Steps

Congratulations! You’ve completed your first evaluation with Galtea using default assets. This is just the beginning. Explore these concepts to tailor Galtea to your specific needs:

If you have any questions or need assistance, contact us at support@galtea.ai.

This guide will walk you through the steps to begin evaluating and monitoring the reliability of your AI products with Galtea as quickly as possible.

Evaluation Workflow Overview

1

Create a Product

Define what functionality or service you want to evaluate on the Galtea platform.
2

Install SDK & Connect

Set up the Galtea Python SDK to interact with the platform.
3

Register a Version

Document a specific implementation of your product using the SDK.
4

Select a Test

Use a default Galtea test (or create your own) to challenge your product.
5

Select a Metric

Use a default Galtea metric (or define your own) to evaluate your product’s version.
6

Run Evaluations

Test your product version with the selected test and metrics, then analyze results.

1. Create a Product

Create a product in the Galtea dashboard. Navigate to Products > Create New Product and complete the form.

The product description is important as it may be used to generate synthetic test data.

2. Install the SDK and Connect

1

Get your API key

In the Galtea dashboard, navigate to Settings > Generate API Key and copy your key.

2

Install the SDK

pip install galtea
3

Connect to the platform

from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

products = galtea.products.list() 
print(f"Found {len(products)} products.")
# Choose a product ID for the next steps
YOUR_PRODUCT_ID = products[0].id if products else "your_product_id"

3. Create a Version

Create a version to track a specific implementation of your product.

version = galtea.versions.create(
    name="v0.1-quickstart",
    product_id=YOUR_PRODUCT_ID,
    description="Initial version for quickstart evaluation"
)
print(f"Created Version with ID: {version.id}")

4. Use a Default Test

For this quickstart, we’ll use the default “Jailbreak” test, which is a type of Red Teaming Test.

test = galtea.tests.get_by_name(product_id=YOUR_PRODUCT_ID, test_name="Jailbreak")
test_cases = galtea.test_cases.list(test_id=test.id)
print(f"Using test '{test.name}' with {len(test_cases)} test cases.")

5. Use a Default Metric

To evaluate the “Jailbreak” test, we’ll use the “Jailbreak Resilience” metric.

metric = galtea.metrics.get_by_name(name="Jailbreak Resilience")

6. Run Evaluations

Now, run an evaluation by creating evaluation tasks.

In a real scenario, your_product_function would be a call to your actual AI model.

# Placeholder for your actual product/model inference function
def your_product_function(input_prompt):
    if "ignore" in input_prompt.lower():
        return "I am programmed to follow safety guidelines and cannot fulfill this request."
    return f"Of course! I will now {input_prompt}"

# An evaluation is created implicitly with the first task.
# Loop through test cases and create evaluation tasks.
for test_case in test_cases:
    actual_output = your_product_function(test_case.input)
    
    galtea.evaluation_tasks.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[metric.name],
        actual_output=actual_output
    )

print(f"Submitted evaluation tasks for version {version.id} using test '{test.name}'.")

7. View Results

You can view results on the Galtea dashboard. Navigate to your product’s “Analytics” tab to see detailed analysis and compare versions.

print(f"View results at: https://platform.galtea.ai/product/{YOUR_PRODUCT_ID}?tab=1")

Next Steps

Congratulations! You’ve completed your first evaluation with Galtea using default assets. This is just the beginning. Explore these concepts to tailor Galtea to your specific needs:

If you have any questions or need assistance, contact us at support@galtea.ai.