What is a Test?
A test in Galtea is a group of test cases designed to evaluate the performance of a product. A test file provides simulations of interactions with the product (and, in quality tests, expected outcomes for each interaction). You can create, view and manage your tests on the Galtea dashboard or programmatically using the Galtea SDK.Test Origin
When creating a test in the Galtea dashboard, you’ll be asked to specify the test origin:Generated
Galtea will take the knowledge base file and generate a set of test cases that will define the test.
Uploaded
The test is uploaded by you as a complete set of test cases.
The SDK parameter
variants is used to specify “Threats” for Red Teaming tests. Similarly, the strategies parameter is used for Red Teaming tests to apply different attack modifications.More information on how to create tests can be found in the Create Quality Tests and Create Red Teaming Tests documentation.
It is important to note that the information you provide during product onboarding, such as the product’s description and intended use, plays a valuable role when generating test cases. Galtea can leverage this metadata to generate more targeted and context-aware test cases when creating both quality and red teaming tests, leading to more effective and insightful evaluations.
Test Types
Galtea supports three main types of tests:Quality Tests
Tests that evaluate the quality and correctness of outputs.
Red Teaming Tests
Tests that evaluate security, safety, and bias aspects, often by generating adversarial inputs based on defined threats and applying various strategies to make them more challenging.
Scenario Based Tests
Tests that evaluate multi-turn dialogue capabilities of an agent, through the use scenarios which are based on user personas and specific goals.
Using Tests in Evaluations
The Test Cases of a Test are used in evaluations to assess the performance of specific versions of your product. These evaluations are grouped within sessions.Using Tests in Evaluations
Learn how to use tests with evaluations
SDK Integration
The Galtea SDK allows you to create, view, and manage tests programmatically.Test Service SDK
Manage tests using the Python SDK
Create a Custom Test
See how to create and upload custom tests using the SDK.
When creating Quality Tests, you can use Evolutions to generate variations of your test cases. Learn more in the Quality Test Evolutions documentation.
Test Properties
The name of the test. Example: “Legal Document Quality Test” or “Customer Support Safety Evaluation”
The type of the test.
Possible values:
- Quality: Tests that evaluate the quality and correctness of outputs
- Red Teaming: Tests that evaluate security, safety, and bias aspects
- Scenarios: Tests that use conversation simulation to evaluate multi-turn dialogue interactions
- Generated
- Uploaded
Optional few-shot examples to provide more context to our system about how the test cases should be generated. This can help our system better understand the expected format and style wanted for the test cases.
Example:
This field only applies if tests are generated by Galtea and are of type
QUALITY.The language for generating synthetic test cases if
Knowledge Base File is provided (e.g., ‘english’, ‘spanish’). This should be the English name of the language. If not provided, Galtea attempts to infer the language from the knowledge base file. Supported languages include English, Spanish, Catalan, French, German, Portuguese, Italian, Dutch, Polish, Chinese, Korean, and Japanese.
This field only applies if tests are generated by Galtea (using
Knowledge Base File).The maximum number of test cases generated by Galtea. This helps control the size of the test dataset and associated costs.
- Quality
- Red Teaming
- Scenarios
The path to a local file (e.g., PDF, TXT, JSON, HTML, Markdown) containing the knowledge base. This file is uploaded to Galtea, which then generates test cases based on its content. Required if the test cases are to be generated by Galtea. Example: “path/to/your/knowledge_base.pdf”
Test File Formats
The format of your test file depends on the test type you’re creating.- Quality
- Red Teaming
- Scenarios
For Quality tests, use the standard CSV format with these columns:
| Column | Required | Description |
|---|---|---|
input | Yes | The question or prompt for the test case |
expected_output | Yes | The expected response from your AI model |
expected_tools | No | List of tools expected to be used by the synthetic user, separated by ;. Example: “book_flight;cancel_flight” |
context | No | The ground truth context for the test case |
tag | No | A categorization label (e.g., “original”, “paraphrased”) |
source | No | The origin of the test case information |