What is an Evaluation?

An evaluation in Galtea is a link between a specific version of a product and a test. It serves as the container for all the evaluation tasks that assess how well the product version performs against the test cases.

Evaluation Workflow

1

Create an Evaluation

Link a specific version with a test to create an evaluation
2

Execute Evaluation Tasks

For each Test Case in the test, run one or more evaluation tasks to assess performance using different metric types

For each Test Case, your product’s model should be used with the test case’s input and context, and its output should be provided to the evaluation tasks.

3

Review Results

Compare results across different versions of the same product

Evaluation Tasks

The core components of an evaluation are its evaluation tasks. Each task represents the assessment of a single test case using specific metric types.

Evaluation Task

Learn more about evaluation tasks

Creating an evaluation does not automatically run evaluation tasks. You need to execute evaluation tasks separately to generate scores and insights.

Results Visualization

Once you’ve created an evaluation, you can access detailed information and results on the platform.

To view evaluation results, you need to visit a product’s page Analytics section. For detailed information about a particular evaluation, you can navigate to the Evaluations tab and select a specific evaluation.

The platform provides:

  • Overview of a product’s evaluations results per metric
  • Analytics comparing different versions of the product
  • Detailed view of individual evaluation tasks

Creating an Evaluation

To create an evaluation in Galtea, you need to specify:

Version
ID (text)
required

The unique identifier of the version to be used in the evaluation.

Test
ID (text)
required

The unique identifier of the test to be used in the evaluation.

Once an evaluation is created, you can execute evaluation tasks to assess the performance of the selected version against the test cases.

SDK Integration

Evaluation Service SDK

SDK methods for managing evaluations