Evaluation
A group of inference results from a session to be used by evaluation tasks
What is an Evaluation?
An evaluation in Galtea is a group of Inference Results from a particular session. It serves as the container for all the evaluation tasks that assess how well the product version performs.
Evaluation Tasks
The core components of an evaluation are its evaluation tasks. Each task represents the assessment of the evaluation using a specific metric type.
An evaluation is created implicitly when you create an evaluation task for a specific session. In this case, the evaluation will be linked to all Inference Results from that session.
Evaluation Task
Learn more about evaluation tasks
Results Visualization
Once you’ve created an evaluation, you can access detailed information and results on the dashboard.
To view evaluation results, you need to visit a product’s Analytics section. For detailed information about a particular evaluation, you can navigate to the Evaluation Tasks tab.
The platform provides:
- Overview of a product’s evaluation results per metric
- Analytics comparing different versions of the product
- Detailed view of individual evaluation tasks
SDK Integration
The Galtea SDK allows you to view and manage evaluations programmatically. This is particularly useful for organizations that want to automate their evaluation process or integrate it into their CI/CD pipeline.
Evaluation Service SDK
Manage evaluations using the Python SDK
Evaluation Properties
The list of inference results that belong to this evaluation. Each inference result is a single output generated by the product version during the session.
The session that all the inference results belong to.