Quickstart
All you need to get started
This guide will walk you through the essential steps to begin evaluating and monitoring the reliability of your AI products with Galtea.
Evaluation Workflow Overview
Evaluating AI products with Galtea follows this pattern:
Create a Product
Define what functionality or service you want to evaluate
Register a Version
Document the specific implementation of your product
Create a Test
Define a set of test cases to evaluate your product’s performance
Define Metrics
Select or create criteria to assess outputs
Run Evaluations
Test your product and analyze the results
Let’s go through each step in more detail.
1. Creating a Product
The first step in tracking the quality and reliability of your AI product is to create a product in the Galtea dashboard.
Navigate to Products > Create New Product and complete the product onboarding form. The product description is particularly important as it may be used during the generation of synthetic test data.
Products can only be created through the web platform, not the SDK. For detailed information about product properties, see the Product documentation.
2. Install the SDK and Connect
After creating your product, we recommend to use the Galtea SDK to programmatically interact with the platform.
Get your API key
In the Galtea platform, navigate to Settings > Generate API Key
Connect to the platform
Using the Galtea SDK object, you can easily connect to the platform:
3. Register a Version
One of the key advantages of the Galtea platform is the ability to track and compare different versions of your AI product. A version captures the specific implementation details such as prompts, model parameters, or RAG configurations.
The product’s ID can be found in the product’s page in the Galtea platform or by listing products using the SDK.
You can create versions using either the SDK (as shown above) or directly through the Galtea platform dashboard.
Version Service API
Learn about all version properties and management capabilities
4. Create a Test
To compare the reliability of different versions, you need to subject each version to the same tests. Galtea supports two test types:
Quality Tests
Tests that evaluate accuracy and correctness
Red Teaming Tests
Tests that evaluate security and safety aspects
You can either upload your own test file or have Galtea generate tests from a knowledge base (ground truth) document:
Tests can be created using either the SDK (as shown above) or directly through the Galtea platform dashboard.
Create a Custom Test
See complete examples of creating and uploading tests
5. Define Metrics
Metrics in Galtea define the criteria by which your product’s outputs will be evaluated. These metric types can be reused across different evaluations.
Metric types can be created using either the SDK (as shown above) or directly through the Galtea platform dashboard.
Metrics Service API
Learn about creating and managing evaluation metrics
6. Run Evaluations
Finally, you’re ready to launch an evaluation to assess how well your product version performs against the test cases:
For real evaluations, you’ll typically run your AI product against all sets of inputs and context in your test cases and then launch an Evaluation Task for each.
The platform will asynchronously evaluate responses and make results available through the dashboard and the SDK. For more information, see evaluation tasks.
Evaluations can be created using either the SDK or the Galtea platform dashboard, but evaluation tasks can only be created through the SDK.
Run Evaluations
See complete examples of running and analyzing evaluations
7. View Results
You can view evaluation results through the SDK or on the Galtea platform:
For richer analysis and comparisons between versions, visit the Analytics section of your product in the Galtea platform.
Next Steps
Congratulations! You’ve completed your first evaluation with Galtea. For more detailed information:
Product
A functionality or service being evaluated
Version
A specific iteration of a product
Test
A set of test cases for evaluating product performance
Test Case
Each challenge in a test for evaluating product performance
Evaluation
A link between a product version and a test that groups evaluation tasks
Evaluation Task
The assessment of a test case from a test using a specific metric type’s evaluation criteria
Metric Type
Ways to evaluate and score product performance
If you have any questions, contact us at support@galtea.ai.