Evaluation Parameters

When creating or configuring a metric, you select which parameters are relevant for your evaluation. These parameters are made available to the evaluator during scoring.

For AI Evaluation, the selected parameters are automatically prepended to your judge_prompt.
For Human Evaluation, they determine which data fields are displayed to annotators.
For Self-Hosted metrics, evaluation parameters do not apply.

Only select parameters that your evaluation criteria actually reference. Including unnecessary parameters can reduce evaluation quality by adding noise to the evaluator’s context.

Parameter Reference

Parameter	Description	Availability
input	The prompt or query sent to the model.	Accuracy, Security & Safety, and Behavior
actual_output	The actual output generated by the model.	Accuracy, Security & Safety, and Behavior
expected_output	The ideal answer for the given input.	Accuracy and Security & Safety
context	Additional background information provided to the model alongside the input.	All metrics
retrieval_context	The context retrieved by your RAG system before sending the user query to your LLM.	Accuracy, Security & Safety, and Behavior
traces	Execution traces from the agent, including tool calls, LLM invocations, and other internal operations.	All metrics
expected_tools	List of tools expected to be used by the agent to accomplish the task.	All metrics
tools_used	List of tools actually used by the agent during execution (automatically inferred from traces).	All metrics
product_description	The description of the product.	All metrics
product_capabilities	The capabilities of the product.	All metrics
product_inabilities	The product’s known inabilities or restrictions.	All metrics
product_security_boundaries	The security boundaries of the product.	All metrics
user_persona	Information about the user interacting with the agent.	Behavior tests
goal	The user’s objective in the conversation.	Behavior tests
scenario	The context or situation for the conversation.	Behavior tests
stopping_criterias	List of criteria that define when a conversation should end.	Behavior tests
conversation_turns	All turns in a conversation, including user and assistant messages.	Behavior tests (Human Evaluation only)

Troubleshooting skipped evaluations

When an evaluation runs against a session that does not have all the data the metric needs, Galtea marks the evaluation as SKIPPED instead of producing a misleading score. The evaluation’s error field describes which parameters are missing, grouped by where you provide them. The categorized error message looks like this:

Metric "Contextual Relevancy" requires data that is not available.

Missing from product settings:
  • product_description - Fill in the "Description" field in your product settings.

Missing from test case:
  • expected_output - Provide an "Expected Output" value on your test cases.

Missing from endpoint connection's output mapping:
  • retrieval_context - Add a "retrieval_context" key to your endpoint connection's output mapping so this field is extracted from your API response.

Missing trace data:
  • traces - This metric needs tracing data. Configure your product to send traces — see the tracing setup docs.

Learn more: https://docs.galtea.ai/concepts/metric/evaluation-parameters#troubleshooting-skipped-evaluations

Each section points to a different place to look:

Section	Where to fix it
Missing from product settings	Open the product and fill in the missing field (Description, Capabilities, Inabilities, or Security Boundaries).
Missing from test case	Edit the test case and provide the missing value. Some fields (Goal, User Persona, Scenario, Stopping Criteria) are only populated on SCENARIOS-type test cases. If your test cases use a different type, use a metric that does not require these parameters.
Missing from endpoint connection’s output mapping	Edit the endpoint connection and add the missing key to the output mapping. `actual_output` is extracted from the `output` key; `retrieval_context` from a `retrieval_context` key. See Templates & Mapping for the JSONPath syntax used in mapping values.
Missing trace data	The metric needs `traces` or `tools_used`. Configure your product to send traces — see the Tracing Agent Operations tutorial for setup, and the Trace concept page for what gets captured. `tools_used` is automatically extracted from your trace data (specifically from trace entries of type `TOOL`); you do not provide it directly.
Missing inference data	The session has no inference results yet. Run the metric against a session that contains at least one conversation turn.

Once you have provided the missing data, re-run the evaluation against the same session.

Evaluation Types

Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.

Metrics Overview

Browse all available metrics and create custom ones.

Introduction

SDK

CLI

Concepts

Evaluation Parameters

Parameter Reference

Troubleshooting skipped evaluations

Evaluation Types

Metrics Overview

Introduction

SDK

CLI

Concepts

Documentation Index

​Parameter Reference

​Troubleshooting skipped evaluations

​Related

Evaluation Types

Metrics Overview

Parameter Reference

Troubleshooting skipped evaluations

Related