Quickstart
All you need to get started with Galtea evaluations
This guide will walk you through the steps to begin evaluating and monitoring the reliability of your AI products with Galtea as quickly as possible.
Evaluation Workflow Overview
Evaluating AI products with Galtea follows this pattern:
Create a Product
Define what functionality or service you want to evaluate on the Galtea platform.
Install SDK & Connect
Set up the Galtea Python SDK to interact with the platform.
Register a Version
Document a specific implementation of your product using the SDK.
Select a Test
Use a default Galtea test (or create your own) to challenge your product.
Select a Metric
Use a default Galtea metric (or define your own) to evaluate your product’s version.
Run Evaluations
Test your product version with the selected test and metrics, then analyze results.
Let’s go through each step in more detail.
1. Create a Product
The first step in tracking the quality and reliability of your AI product is to create a product in the Galtea dashboard.
Navigate to Products > Create New Product and complete the product onboarding form.
The product description is particularly important as it may be used during the generation of synthetic test data if you choose to create custom tests later.
Products can only be created through the web platform, not the SDK. For detailed information, see the Product documentation.
2. Install the SDK and Connect
After creating your product, use the Galtea SDK to programmatically interact with the platform.
Get your API key
In the Galtea dashboard, navigate to Settings > Generate API Key and copy/save your API key.
Install the SDK
Install the Galtea SDK using pip:
For detailed installation instructions and requirements see the installation guide.
Connect to the platform
Using the Galtea SDK object, you can easily connect to the platform:
3. Create a Version
One of the key advantages of the Galtea dashboard is the ability to track and compare different versions of your AI product. A version captures the specific implementation details such as prompts, model parameters, or RAG configurations.
You need the product ID of the product you created in Step 1. You can find it in the product’s page in the dashboard or by listing your products using the SDK as shown above.
All optional_props
can be found in the Create Version API documentation.
You can create versions using either the SDK (as shown above) or directly through the Galtea dashboard by navigating to your product page and clicking on the Versions tab.
Versions Concepts
Learn about versions in Galtea
4. Use a Default Test (or Create Your Own)
To evaluate your version, you need a test. Galtea provides default tests to help you get started quickly. For this quickstart, we’ll use the default “Jailbreak” test, which is a type of Red Teaming Test.
The “Jailbreak” test contains various prompts designed to attempt to bypass an AI’s safety guardrails. This helps you assess your product’s robustness against such attempts.
Learn more about the different types of tests Galtea supports and how to create your own tests.
Tests Concepts
Learn about tests (and how to create them via the dashboard).
Create a Custom Test
See how to create and upload custom tests using the SDK.
5. Use a Default Metric (or Define Your Own)
Metrics in Galtea define the criteria for evaluation. Galtea provides default metrics. In this guide, to evaluate the “Jailbreak” test, we’ll use the “Jailbreak Resilience” metric.
The “Jailbreak Resilience” metric analyzes the AI’s response to a jailbreak prompt and scores its ability to maintain safety and refuse inappropriate requests.
If you need to evaluate other aspects or define custom criteria:
Metrics Concepts
Learn about metrics in Galtea
6. Run Evaluations
Finally, you’re ready to launch an evaluation to assess how well your product version performs against the test cases using the selected metric.
For real evaluations, your AI product (your_product_function
below) should be called with each input from the test cases. Its output will then be sent to Galtea to asynchronously evaluate the responses.
Run Evaluations Example
See more detailed examples of running and analyzing evaluations.
7. View Results
You can view evaluation results, scores, and detailed reasons through the SDK or, more comprehensively, on the Galtea dashboard.
For richer analysis, comparisons between versions, and visualizations, navigate to your product’s “Analytics” tab on the Galtea dashboard.
Next Steps
Congratulations! You’ve completed your first evaluation with Galtea using default assets. This is just the beginning. Explore these concepts to tailor Galtea to your specific needs:
Product
A functionality or service being evaluated
Version
A specific iteration of a product
Test
A set of test cases for evaluating product performance
Test Case
Each challenge in a test for evaluating product performance
Evaluation
A link between a product version and a test that groups evaluation tasks
Evaluation Task
The assessment of a test case from a test using a specific metric type’s evaluation criteria
Metric Type
Ways to evaluate and score product performance
If you have any questions or need assistance, contact us at support@galtea.ai.