Evaluation
A link between a product version and a test
What is an Evaluation?
An evaluation in Galtea is a link between a specific version and a test. It serves as the container for all the evaluation tasks that assess how well the product version performs against the test cases.
The only way to create evaluations is programmatically by using the Galtea SDK, but they can be viewed and managed on the Galtea dashboard.
Evaluation Workflow
Execute Evaluation Tasks
For each Test Case in the test, run one or more evaluation tasks to assess performance using different metric types
For each Test Case, your product’s model should be called with the test case’s data, and its output should be provided to the evaluation tasks.
Review Results
Evaluation Tasks
The core components of an evaluation are its evaluation tasks. Each task represents the assessment of a single test case using specific metric.
Creating an evaluation does not automatically run evaluation tasks. You need to execute evaluation tasks separately to generate scores and insights.
Evaluation Task
Learn more about evaluation tasks
Results Visualization
Once you’ve created an evaluation, you can access detailed information and results on the dashboard.
To view evaluation results, you need to visit a product’s page Analytics section. For detailed information about a particular evaluation, you can navigate to the Evaluations tab and select a specific evaluation.
The platform provides:
- Overview of a product’s evaluations results per metric
- Analytics comparing different versions of the product
- Detailed view of individual evaluation tasks
SDK Integration
The Galtea SDK allows you to create, view, and manage evaluations programmatically. This is particularly useful for organizations that want to automate their versioning process or integrate it into their CI/CD pipeline.
Evaluation Service SDK
Manage evaluations using the Python SDK
GitHub Actions
Learn how to set up GitHub Actions to automatically evaluate new versions
Evaluation Properties
The version to be evaluated in the evaluation.
The test to be used in the evaluation.
Once an evaluation is created, you can execute evaluation tasks to assess the performance of the selected version against the test cases.