What is an Evaluation?

An evaluation in Galtea is a link between a specific version and a test. It serves as the container for all the evaluation tasks that assess how well the product version performs against the test cases.

The only way to create evaluations is programmatically by using the Galtea SDK, but they can be viewed and managed on the Galtea dashboard.

Evaluation Workflow

1

Create an Evaluation

Link a specific version with a test to create an evaluation
2

Execute Evaluation Tasks

For each Test Case in the test, run one or more evaluation tasks to assess performance using different metric types

For each Test Case, your product’s model should be called with the test case’s data, and its output should be provided to the evaluation tasks.

3

Review Results

Compare results across different versions of the same product using the dashboard

Evaluation Tasks

The core components of an evaluation are its evaluation tasks. Each task represents the assessment of a single test case using specific metric.

Creating an evaluation does not automatically run evaluation tasks. You need to execute evaluation tasks separately to generate scores and insights.

Evaluation Task

Learn more about evaluation tasks

Results Visualization

Once you’ve created an evaluation, you can access detailed information and results on the dashboard.

To view evaluation results, you need to visit a product’s page Analytics section. For detailed information about a particular evaluation, you can navigate to the Evaluations tab and select a specific evaluation.

The platform provides:

  • Overview of a product’s evaluations results per metric
  • Analytics comparing different versions of the product
  • Detailed view of individual evaluation tasks

SDK Integration

The Galtea SDK allows you to create, view, and manage evaluations programmatically. This is particularly useful for organizations that want to automate their versioning process or integrate it into their CI/CD pipeline.

Evaluation Properties

Version
Version
required

The version to be evaluated in the evaluation.

Test
Test
required

The test to be used in the evaluation.

Once an evaluation is created, you can execute evaluation tasks to assess the performance of the selected version against the test cases.