What is a Test Case?
A test case in Galtea is a challenge designed to evaluate the performance of a particular version of a product. It represents a specific set of inputs that must be sent to the product’s AI model to generate an output. It also can define an expected output that can be used to assess the product’s capabilities.Test cases are part of a test, so you’ll need to create a test first. You can do this in the Galtea dashboard or using the SDK.
Using Test Cases in Evaluations
Test cases are used with evaluations. Sessions are created automatically when running evaluations.Create an Evaluation
Learn how to use tests in evaluations
SDK Integration
The Galtea SDK allows you to create, view, and manage test cases programmatically.Test Case Service SDK
Manage test cases using the Python SDK
Test Case Properties
The input data used for inference on the LLM product’s version. This is the question or prompt that will be sent to your AI model. Example: “What is the operating system of the Samsung Galaxy A8?”
The expected output for the evaluation. This represents the ideal response you want your AI model to provide. Example: “The operating system of the Samsung Galaxy A8 is Android 8.1.”
Test case-specific context that provides context to the model. It can be the past conversation or any other relevant information. Do not mistake for the system prompt/few-shot examples; those should be defined in the product’s version unless they change on an interaction basis.
A label that helps categorize the test case (e.g., “original”, “paraphrased”, “incorrect”). This corresponds to the
tag
column in an uploaded test CSV file and is used for organizing and filtering test cases.The original source text used to create the test case. This can be helpful for tracing where test cases originated from, especially when generating test cases from documentation or other reference materials.
Indicates if a test case has been manually reviewed and approved by a user. This is useful for tracking which test cases are considered reliable and validated. The system stores the ID of the user who reviewed the test case, upon creation or update, the one who created or updated the test case is set as the reviewer.
User vote for the test case quality. Possible values are
1
(upvote), -1
(downvote), or 0
(unreviewed).A justification for the user score, providing context on why a test case was downvoted (or upvoted).
The confidence level of the test case.
A justification for the confidence level of the test case.
Scenario-Based Test Case Properties
The following properties apply specifically to test cases of typeSCENARIOS
used with the Conversation Simulator:
The objective the synthetic user is trying to achieve in a conversation scenario. This defines what the synthetic user wants to accomplish during the interaction. Example: “Book a flight to New York”
The personality of the synthetic user for a conversation scenario. This shapes how the synthetic user will behave and communicate during the conversation. Example: “A busy professional who values efficiency”
A description of the specific scenario or situation in which the conversation takes place. This provides additional context for the interaction. Example: “Flight booking scenario”
The first message from the synthetic user that starts the conversation. If not provided, the simulator will generate an appropriate opening message based on the goal and persona. Example: “I need to book a flight”
Conditions that determine when the conversation should end, separated by
;
or |
. The conversation will stop when any of these criteria are met. Example: “Booking confirmed|Unable to fulfill request”The maximum number of conversation turns allowed in the simulation. This prevents conversations from running indefinitely. Example: 10
For Scenarios test cases, only
Goal
and User Persona
are mandatory. All other scenario properties are optional but can help create more realistic and controlled conversation simulations.