What are Metric Types?

Metric types in Galtea define the specific criteria and methods used to evaluate the performance of your product. They determine how outputs are scored during evaluation tasks, ensuring consistent and meaningful assessment.

Metric types are organization-wide and can be reused across multiple products.

You can view and manage your metric types on the Galtea dashboard.

Metric Type Properties

When creating a metric type in Galtea, you’ll need to provide the following information:

Name
Text
required

The name of the metric type. Example: “Factual Accuracy”

Evaluation Parameters
Text List
required

Standard parameters that define the inputs and outputs available during the evaluation process. These parameters should be explicitly mentioned in your evaluation criteria or steps to ensure they’re taken into account during assessment.

ParameterDescription
*InputThe original query or prompt sent to your product
Actual OutputThe response generated by your product that needs evaluation
Expected OutputThe ideal response
Retrieval ContextInformation retrieved from knowledge bases to support RAG systems
ContextAdditional background information provided with the input and related to the grodun truth

“Input” will always need to be part of the evaluation params.

You can directly reference these parameters in your criteria or evaluation steps. For example: “Evaluate if the Actual Output contains factually correct information that aligns with verified sources in the Retrieval Context.”

To ensure accurate evaluation results, include only those parameters in your evaluation_params list that you’ve explicitly referenced in your criteria or evaluation steps. You may refer to some parameters by other descriptive names like ‘response’ instead of ‘actual output’ and that is ok but you would also need to include actual output in the evaluation_params list.

Criteria
Text
required

High-level standards that define what aspects of a response matter for evaluation. Example: “Evaluate if the response contains factually correct information that aligns with verified sources. Penalize statements that contradict established knowledge or introduce speculation without citation.”

Evaluation Steps
Text List
required

A structured set of checks that determine how a metric assesses correctness. Example:

  1. Check if the ‘actual output’ contains facts that align with verified sources
  2. Identify any contradictions between the ‘actual output’ and established knowledge
  3. Penalize statements that introduce speculation without citing a credible source
You need to provide either Criteria or Evaluation Steps, but not both. Your choice depends on your preferred evaluation approach.

Evaluation Criteria vs. Evaluation Steps

Understanding the difference between these two approaches is essential for creating effective metrics:

Evaluation Criteria

What matters in a response, defining the high-level qualities or standards

Evaluation Steps

How to measure a response’s quality, providing specific assessment actions

Evaluation Criteria

Evaluation criteria are high-level qualities or standards that define what makes a response good or bad. They outline fundamental aspects that should be assessed without specifying exactly how to measure them.

Evaluation criteria define what matters in a response, serving as the foundation for meaningful assessment.

Evaluation Steps

Evaluation steps are the specific actions taken to measure how well a response meets the evaluation criteria. These steps break down the assessment into concrete, structured processes that reference evaluation parameters.

Evaluation steps define how to measure a response’s quality based on the evaluation criteria, making explicit reference to specific evaluation parameters like Input, Actual Output, Expected Output, Retrieval Context, and Context.

Comparing Evaluation Approaches

The following table highlights the key differences between evaluation criteria and evaluation steps:

AspectEvaluation CriteriaEvaluation Steps
DefinitionHigh-level qualities that define what makes a response good or badStep-by-step actions to measure a response’s quality
PurposeEstablish broad goals for evaluationProvide a systematic method to assess responses
FocusWhat should be measuredHow to measure it
ExamplesAccuracy, conciseness, relevance, fluencyCompare facts, check for contradictions, assess completeness
FlexibilityGeneral principles that apply across many use casesSpecific steps that vary depending on the system

Custom Metrics

You have the flexibility to create custom metrics tailored to your specific use cases and evaluation preferences. This feature enables you to define a metric with a name, conduct evaluations locally, and directly assign scores.

The platform cannot automatically evaluate custom metrics, as it lacks the necessary information. Therefore, you are responsible for uploading the evaluation results to visualize charts and data based on these metrics.

SDK Integration

Metrics Service SDK

SDK methods for managing metric types