Test Case
A single challenge for evaluating product performance
What is a Test Case?
A test case in Galtea is a challenge designed to evaluate the performance of a particular version of a product.
It represents a specific set of inputs that must be sent to the product’s AI model to generate an output. It also can define an expected output that can be used to assess the product’s capabilities.
Test cases are part of a test, so you’ll need to create a test first. You can do this in the Galtea dashboard or using the SDK.
You can create, view and manage your test cases on the Galtea dashboard or programmatically using the Galtea SDK.
Using Test Cases in Evaluations
Test cases are used with evaluation tasks, but first wou will have to create an evaluation using the Galtea dashboard or the SDK.
Create an Evaluation
Learn how to use tests in evaluations
SDK Integration
The Galtea SDK allows you to create, view, and manage test cases programmatically.
Test Case Service SDK
Manage test cases using the Python SDK
Test Case Properties
The input data used for inference on the LLM product’s version. This is the question or prompt that will be sent to your AI model. Example: “What is the operating system of the Samsung Galaxy A8?”
The expected output for the evaluation task. This represents the ideal response you want your AI model to provide. Example: “The operating system of the Samsung Galaxy A8 is Android 8.1.”
Test case-specific context that provides context to the model. It can be the past conversation or any other relevant information. Do not mistake for the system prompt/few-shot examples; those should be defined in the product’s version unless they change on an interaction basis.
A label that helps categorize the test case. This helps in organizing and filtering test cases within your test suite (e.g., “specification”, “factual_knowledge”, “edge_case”).
The original source text used to create the test case. This can be helpful for tracing where test cases originated from, especially when generating test cases from documentation or other reference materials.
For conversational AI, this field can store a list of previous turns in the conversation. Each turn is a dictionary with “input” and “actual_output” keys. This data is primarily used when evaluating conversational metrics.
Example: [{"input": "What's the weather like?", "actual_output": "It's sunny today!"}, {"input": "Great, any recommendations for outdoor activities?", "actual_output": None}]