Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt

Use this file to discover all available pages before exploring further.

When creating or configuring a metric, you select which parameters are relevant for your evaluation. These parameters are made available to the evaluator during scoring.
  • For AI Evaluation, the selected parameters are automatically prepended to your judge_prompt.
  • For Human Evaluation, they determine which data fields are displayed to annotators.
  • For Self-Hosted metrics, evaluation parameters do not apply.
Only select parameters that your evaluation criteria actually reference. Including unnecessary parameters can reduce evaluation quality by adding noise to the evaluator’s context.

Parameter Reference

ParameterDescriptionAvailability
inputThe prompt or query sent to the model.Accuracy, Security & Safety, and Behavior
actual_outputThe actual output generated by the model.Accuracy, Security & Safety, and Behavior
expected_outputThe ideal answer for the given input.Accuracy and Security & Safety
contextAdditional background information provided to the model alongside the input.All metrics
retrieval_contextThe context retrieved by your RAG system before sending the user query to your LLM.Accuracy, Security & Safety, and Behavior
tracesExecution traces from the agent, including tool calls, LLM invocations, and other internal operations.All metrics
expected_toolsList of tools expected to be used by the agent to accomplish the task.All metrics
tools_usedList of tools actually used by the agent during execution (automatically inferred from traces).All metrics
product_descriptionThe description of the product.All metrics
product_capabilitiesThe capabilities of the product.All metrics
product_inabilitiesThe product’s known inabilities or restrictions.All metrics
product_security_boundariesThe security boundaries of the product.All metrics
user_personaInformation about the user interacting with the agent.Behavior tests
goalThe user’s objective in the conversation.Behavior tests
scenarioThe context or situation for the conversation.Behavior tests
stopping_criteriasList of criteria that define when a conversation should end.Behavior tests
conversation_turnsAll turns in a conversation, including user and assistant messages.Behavior tests (Human Evaluation only)

Troubleshooting skipped evaluations

When an evaluation runs against a session that does not have all the data the metric needs, Galtea marks the evaluation as SKIPPED instead of producing a misleading score. The evaluation’s error field describes which parameters are missing, grouped by where you provide them. The categorized error message looks like this:
Metric "Contextual Relevancy" requires data that is not available.

Missing from product settings:
  • product_description - Fill in the "Description" field in your product settings.

Missing from test case:
  • expected_output - Provide an "Expected Output" value on your test cases.

Missing from endpoint connection's output mapping:
  • retrieval_context - Add a "retrieval_context" key to your endpoint connection's output mapping so this field is extracted from your API response.

Missing trace data:
  • traces - This metric needs tracing data. Configure your product to send traces — see the tracing setup docs.

Learn more: https://docs.galtea.ai/concepts/metric/evaluation-parameters#troubleshooting-skipped-evaluations
Each section points to a different place to look:
SectionWhere to fix it
Missing from product settingsOpen the product and fill in the missing field (Description, Capabilities, Inabilities, or Security Boundaries).
Missing from test caseEdit the test case and provide the missing value. Some fields (Goal, User Persona, Scenario, Stopping Criteria) are only populated on SCENARIOS-type test cases. If your test cases use a different type, use a metric that does not require these parameters.
Missing from endpoint connection’s output mappingEdit the endpoint connection and add the missing key to the output mapping. actual_output is extracted from the output key; retrieval_context from a retrieval_context key. See Templates & Mapping for the JSONPath syntax used in mapping values.
Missing trace dataThe metric needs traces or tools_used. Configure your product to send traces — see the Tracing Agent Operations tutorial for setup, and the Trace concept page for what gets captured. tools_used is automatically extracted from your trace data (specifically from trace entries of type TOOL); you do not provide it directly.
Missing inference dataThe session has no inference results yet. Run the metric against a session that contains at least one conversation turn.
Once you have provided the missing data, re-run the evaluation against the same session.

Evaluation Types

Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.

Metrics Overview

Browse all available metrics and create custom ones.