Quickstart

This guide will walk you through the steps to begin evaluating and monitoring the reliability of your AI products with Galtea as quickly as possible.

Evaluation Workflow Overview

Create a Product

Define what functionality or service you want to evaluate on the Galtea platform.

Install SDK & Connect

Set up the Galtea Python SDK to interact with the platform.

Document a specific implementation of your product using the SDK.

Select a Test

Use a default Galtea test (or create your own) to challenge your product.

Select a Metric

Use a default Galtea metric (or define your own) to evaluate your product’s version.

Run Evaluations

Test your product version with the selected test and metrics, then analyze results.

1. Create a Product

Create a product in the Galtea dashboard. Navigate to Products > Create New Product and complete the form.

The product description is important as it may be used to generate synthetic test data.

2. Install the SDK and Connect

Get your API key

In the Galtea dashboard, navigate to Settings > Generate API Key and copy your key.

Install the SDK

pip install galtea

Connect to the platform

from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

products = galtea.products.list() 
print(f"Found {len(products)} products.")
# Choose a product ID for the next steps
YOUR_PRODUCT_ID = products[0].id if products else "your_product_id"

3. Create a Version

Create a version to track a specific implementation of your product.

version = galtea.versions.create(
    name="v0.1-quickstart",
    product_id=YOUR_PRODUCT_ID,
    description="Initial version for quickstart evaluation"
)
print(f"Created Version with ID: {version.id}")

4. Use a Default Test

For this quickstart, we’ll use the default “Jailbreak” test, which is a type of Red Teaming Test.

test = galtea.tests.get_by_name(product_id=YOUR_PRODUCT_ID, test_name="Jailbreak")
test_cases = galtea.test_cases.list(test_id=test.id)
print(f"Using test '{test.name}' with {len(test_cases)} test cases.")

5. Use a Default Metric

To evaluate the “Jailbreak” test, we’ll use the “Jailbreak Resilience” metric.

metric = galtea.metrics.get_by_name(name="Jailbreak Resilience")

6. Run Evaluations

Now, run an evaluation by creating evaluation tasks.

In a real scenario, your_product_function would be a call to your actual AI model.

# Placeholder for your actual product/model inference function
def your_product_function(input_prompt):
    if "ignore" in input_prompt.lower():
        return "I am programmed to follow safety guidelines and cannot fulfill this request."
    return f"Of course! I will now {input_prompt}"

# An evaluation is created implicitly with the first task.
# Loop through test cases and create evaluation tasks.
for test_case in test_cases:
    actual_output = your_product_function(test_case.input)
    
    galtea.evaluation_tasks.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[metric.name],
        actual_output=actual_output
    )

print(f"Submitted evaluation tasks for version {version.id} using test '{test.name}'.")

7. View Results

You can view results on the Galtea dashboard. Navigate to your product’s “Analytics” tab to see detailed analysis and compare versions.

print(f"View results at: https://platform.galtea.ai/product/{YOUR_PRODUCT_ID}?tab=1")

Next Steps

Congratulations! You’ve completed your first evaluation with Galtea using default assets. This is just the beginning. Explore these concepts to tailor Galtea to your specific needs:

Product

A functionality or service being evaluated

Version

A specific iteration of a product

Test

A set of test cases for evaluating product performance

Session

A full conversation between a user and an AI system.

Inference Result

A single turn in a conversation between a user and the AI.

Evaluation

A group of evaluable Inference Results from a particular session

Evaluation Task

The assessment of an evaluation using a specific metric type’s criteria

Metric Type

Ways to evaluate and score product performance

Model

Way to keep track of your models’ costs

If you have any questions or need assistance, contact us at support@galtea.ai.

Getting Started

Tutorials

Integrations

Evaluation Workflow Overview

1. Create a Product

2. Install the SDK and Connect

3. Create a Version

4. Use a Default Test

5. Use a Default Metric

6. Run Evaluations

7. View Results

Next Steps

Product

Version

Test

Session

Inference Result

Evaluation

Evaluation Task

Metric Type

Model

Getting Started

Tutorials

Integrations

​Evaluation Workflow Overview

​1. Create a Product

​2. Install the SDK and Connect

​3. Create a Version

​4. Use a Default Test

​5. Use a Default Metric

​6. Run Evaluations

​7. View Results

​Next Steps

Product

Version

Test

Session

Inference Result

Evaluation

Evaluation Task

Metric Type

Model

Evaluation Workflow Overview

1. Create a Product

2. Install the SDK and Connect

3. Create a Version

4. Use a Default Test

5. Use a Default Metric

6. Run Evaluations

7. View Results

Next Steps