Evaluation
A link between a product version and a test
What is an Evaluation?
An evaluation in Galtea is a link between a specific version of a product and a test. It serves as the container for all the evaluation tasks that assess how well the product version performs against the test cases.
Evaluation Workflow
Execute Evaluation Tasks
For each Test Case in the test, run one or more evaluation tasks to assess performance using different metric types
For each Test Case, your product’s model should be used with the test case’s input and context, and its output should be provided to the evaluation tasks.
Review Results
Evaluation Tasks
The core components of an evaluation are its evaluation tasks. Each task represents the assessment of a single test case using specific metric types.
Evaluation Task
Learn more about evaluation tasks
Creating an evaluation does not automatically run evaluation tasks. You need to execute evaluation tasks separately to generate scores and insights.
Results Visualization
Once you’ve created an evaluation, you can access detailed information and results on the platform.
To view evaluation results, you need to visit a product’s page Analytics section. For detailed information about a particular evaluation, you can navigate to the Evaluations tab and select a specific evaluation.
The platform provides:
- Overview of a product’s evaluations results per metric
- Analytics comparing different versions of the product
- Detailed view of individual evaluation tasks
Creating an Evaluation
To create an evaluation in Galtea, you need to specify:
The unique identifier of the version to be used in the evaluation.
The unique identifier of the test to be used in the evaluation.
Once an evaluation is created, you can execute evaluation tasks to assess the performance of the selected version against the test cases.
SDK Integration
Evaluation Service SDK
SDK methods for managing evaluations