Metric Type
Ways to evaluate and score product performance
What are Metric Types?
Metric types in Galtea define the specific criteria and methods used to evaluate the performance of your product. They determine how outputs are scored during evaluation tasks, ensuring consistent and meaningful assessment.
Metric types are organization-wide and can be reused across multiple products.
You can view and manage your metric types on the Galtea dashboard.
Metric Type Properties
When creating a metric type in Galtea, you’ll need to provide the following information:
The name of the metric type. Example: “Factual Accuracy”
Standard parameters that define the inputs and outputs available during the evaluation process. These parameters should be explicitly mentioned in your evaluation criteria or steps to ensure they’re taken into account during assessment.
Parameter | Description |
---|---|
*Input | The original query or prompt sent to your product |
Actual Output | The response generated by your product that needs evaluation |
Expected Output | The ideal response |
Retrieval Context | Information retrieved from knowledge bases to support RAG systems |
Context | Additional background information provided with the input and related to the grodun truth |
“Input” will always need to be part of the evaluation params.
You can directly reference these parameters in your criteria or evaluation steps. For example: “Evaluate if the Actual Output contains factually correct information that aligns with verified sources in the Retrieval Context.”
To ensure accurate evaluation results, include only those parameters in your evaluation_params
list that you’ve explicitly referenced in your criteria or evaluation steps.
You may refer to some parameters by other descriptive names like ‘response’ instead of ‘actual output’ and that is ok but you would also need to include actual output in the evaluation_params list.
High-level standards that define what aspects of a response matter for evaluation. Example: “Evaluate if the response contains factually correct information that aligns with verified sources. Penalize statements that contradict established knowledge or introduce speculation without citation.”
A structured set of checks that determine how a metric assesses correctness. Example:
- Check if the ‘actual output’ contains facts that align with verified sources
- Identify any contradictions between the ‘actual output’ and established knowledge
- Penalize statements that introduce speculation without citing a credible source
Evaluation Criteria vs. Evaluation Steps
Understanding the difference between these two approaches is essential for creating effective metrics:
Evaluation Criteria
What matters in a response, defining the high-level qualities or standards
Evaluation Steps
How to measure a response’s quality, providing specific assessment actions
Evaluation Criteria
Evaluation criteria are high-level qualities or standards that define what makes a response good or bad. They outline fundamental aspects that should be assessed without specifying exactly how to measure them.
Evaluation Steps
Evaluation steps are the specific actions taken to measure how well a response meets the evaluation criteria. These steps break down the assessment into concrete, structured processes that reference evaluation parameters.
Comparing Evaluation Approaches
The following table highlights the key differences between evaluation criteria and evaluation steps:
Aspect | Evaluation Criteria | Evaluation Steps |
---|---|---|
Definition | High-level qualities that define what makes a response good or bad | Step-by-step actions to measure a response’s quality |
Purpose | Establish broad goals for evaluation | Provide a systematic method to assess responses |
Focus | What should be measured | How to measure it |
Examples | Accuracy, conciseness, relevance, fluency | Compare facts, check for contradictions, assess completeness |
Flexibility | General principles that apply across many use cases | Specific steps that vary depending on the system |
Custom Metrics
You have the flexibility to create custom metrics tailored to your specific use cases and evaluation preferences. This feature enables you to define a metric with a name, conduct evaluations locally, and directly assign scores.
The platform cannot automatically evaluate custom metrics, as it lacks the necessary information. Therefore, you are responsible for uploading the evaluation results to visualize charts and data based on these metrics.
Related Concepts
Evaluation
A link between a product version and a test that groups evaluation tasks
Evaluation Task
The assessment of a test case from a test using a specific metric type’s evaluation criteria
SDK Integration
Metrics Service SDK
SDK methods for managing metric types