Test
A set of test cases for evaluating product performance
What is a Test?
A test in Galtea is a group of test cases designed to evaluate the performance of a product. A test file provides simulations of interactions with the product (and, in quality tests, expected outcomes for each interaction).
You can create, view and manage your tests on the Galtea dashboard or programmatically using the Galtea SDK.
Test Origin
When creating a test in the Galtea dashboard, you’ll be asked to specify the test origin:
Generated
Galtea will take the knowledge base file and generate a set of test cases that will define the test.
Uploaded
The test is uploaded by you as a complete set of test cases.
Your selection will determine whether you need to provide a Knowledge Base File or a Test File.
The SDK parameter variants
is used to specify “Evolutions” for Quality tests and “Threats” for Red Teaming tests. Similarly, the strategies
parameter is used for Red Teaming tests to apply different attack modifications.
More information on how to create tests can be found in the Create Quality Tests and Create Red Teaming Tests documentation.
It is important to note that the information you provide during product onboarding, such as the product’s description and intended use, plays a valuable role when generating test cases. Galtea can leverage this metadata to generate more targeted and context-aware test cases when creating both quality and red teaming tests, leading to more effective and insightful evaluations.
Test Types
Galtea supports three main types of tests:
Quality Tests
Tests that evaluate the quality and correctness of outputs.
Red Teaming Tests
Tests that evaluate security, safety, and bias aspects, often by generating adversarial inputs based on defined threats and applying various strategies to make them more challenging.
Scenario Based Tests
Tests that evaluate multi-turn dialogue capabilities of an agent, through the use scenarios which are based on user personas and specific goals.
Using Tests in Evaluations
The Test Cases of a Test are used in evaluation tasks to assess the performance of specific versions of your product against a set of metrics.
The Test Cases of a Test should be reused across multiple evaluations of distinct versions to ensure consistent comparison between different product versions.
Using Tests in Evaluations
Learn how to use tests with evaluations
SDK Integration
The Galtea SDK allows you to create, view, and manage tests programmatically.
Test Service SDK
Manage tests using the Python SDK
Create a Custom Test
See how to create and upload custom tests using the SDK.
Test Properties
The name of the test. Example: “Legal Document Quality Test” or “Customer Support Safety Evaluation”
The type of the test. Possible values:
- Quality: Tests that evaluate the quality and correctness of outputs
- Red Teaming: Tests that evaluate security, safety, and bias aspects
- Scenarios: Tests that use conversation simulation to evaluate multi-turn dialogue interactions
Optional few-shot examples to provide more context to our system about how the test cases should be generated. This can help our system better understand the expected format and style wanted for the test cases. Example:
QUALITY
.The language for generating synthetic test cases if Knowledge Base File
is provided (e.g., ‘english’, ‘spanish’). This should be the English name of the language. If not provided, Galtea attempts to infer the language from the knowledge base file. Supported languages include English, Spanish, Catalan, French, German, Portuguese, Italian, Dutch, Polish, Chinese, Korean, and Japanese.
Knowledge Base File
).The maximum number of test cases generated by Galtea. This helps control the size of the test dataset and associated costs.
The path to a local file (e.g., PDF, TXT, JSON, HTML, Markdown) containing the knowledge base. This file is uploaded to Galtea, which then generates test cases based on its content. Required if the test cases are to be generated by Galtea. Example: “path/to/your/knowledge_base.pdf”
Allows for the generation of variations of test cases (e.g., paraphrased questions, questions with typos). For more details on available evolutions, see Quality Test Evolutions.
The path to a local file (e.g., PDF, TXT, JSON, HTML, Markdown) containing the knowledge base. This file is uploaded to Galtea, which then generates test cases based on its content. Required if the test cases are to be generated by Galtea. Example: “path/to/your/knowledge_base.pdf”
Allows for the generation of variations of test cases (e.g., paraphrased questions, questions with typos). For more details on available evolutions, see Quality Test Evolutions.
Specifies which threat categories to generate test cases for. This corresponds to the variants
parameter in the SDK.
A list of red teaming strategies to modify prompts for each threat. This corresponds to the strategies
parameter in the SDK.
Optional file containing context or domain knowledge to help generate more realistic conversation scenarios. Example: “path/to/your/domain_context.pdf”
The maximum number of conversation scenarios generated by Galtea. This helps control the size of the test dataset.
Optional few-shot examples to provide more context to our system about how the test cases should be generated. This can help our system better understand the expected format and style wanted for the test cases. Example:
QUALITY
.The language for generating synthetic test cases if Knowledge Base File
is provided (e.g., ‘english’, ‘spanish’). This should be the English name of the language. If not provided, Galtea attempts to infer the language from the knowledge base file. Supported languages include English, Spanish, Catalan, French, German, Portuguese, Italian, Dutch, Polish, Chinese, Korean, and Japanese.
Knowledge Base File
).The maximum number of test cases generated by Galtea. This helps control the size of the test dataset and associated costs.
The path to a local file (e.g., PDF, TXT, JSON, HTML, Markdown) containing the knowledge base. This file is uploaded to Galtea, which then generates test cases based on its content. Required if the test cases are to be generated by Galtea. Example: “path/to/your/knowledge_base.pdf”
Allows for the generation of variations of test cases (e.g., paraphrased questions, questions with typos). For more details on available evolutions, see Quality Test Evolutions.
The path to a local file (e.g., PDF, TXT, JSON, HTML, Markdown) containing the knowledge base. This file is uploaded to Galtea, which then generates test cases based on its content. Required if the test cases are to be generated by Galtea. Example: “path/to/your/knowledge_base.pdf”
Allows for the generation of variations of test cases (e.g., paraphrased questions, questions with typos). For more details on available evolutions, see Quality Test Evolutions.
Specifies which threat categories to generate test cases for. This corresponds to the variants
parameter in the SDK.
A list of red teaming strategies to modify prompts for each threat. This corresponds to the strategies
parameter in the SDK.
Optional file containing context or domain knowledge to help generate more realistic conversation scenarios. Example: “path/to/your/domain_context.pdf”
The maximum number of conversation scenarios generated by Galtea. This helps control the size of the test dataset.
The path to a local CSV file containing predefined test cases. This file is uploaded to Galtea. Required if you are providing your own set of test cases instead of having Galtea generate them. Example: “path/to/your/test_file.csv”
File Format Requirements:
- Quality/Red Teaming Tests: Must include
input
,expected_output
,tag
,source
columns - Scenarios Tests: Must include conversation simulation columns (see Scenarios Test File Format below)
Test File Formats
The format of your test file depends on the test type you’re creating.
Quality and Red Teaming Test File Format
For Quality and Red Teaming tests, use the standard format:
input
: The question or prompt for the test caseexpected_output
: The expected response (optional for some test types)tag
: A categorization label for the test casesource
: The origin of the test case information
Scenarios Test File Format
For Scenarios tests that use the Conversation Simulator, use this specialized format:
The objective the synthetic user is trying to achieve. Example: “Book a flight to New York”
The personality of the synthetic user. Example: “A busy professional who values efficiency”
The first message from the synthetic user. Example: “I need to book a flight”
Conversation end conditions, separated by ;
or |
. Example: “Booking confirmed|Unable to fulfill request”
Maximum conversation turns. Example: 10
Scenario description. Example: “Flight booking scenario”
Only goal
and user_persona
are mandatory for conversation simulation. See the Conversation Simulator Tutorial for complete implementation examples.