Skip to main content
When creating or configuring a metric, you select which parameters are relevant for your evaluation. These parameters are made available to the evaluator during scoring.
  • For AI Evaluation, the selected parameters are automatically prepended to your judge_prompt.
  • For Human Evaluation, they determine which data fields are displayed to annotators.
  • For Self-Hosted metrics, evaluation parameters do not apply.
Only select parameters that your evaluation criteria actually reference. Including unnecessary parameters can reduce evaluation quality by adding noise to the evaluator’s context.

Parameter Reference

ParameterDescriptionAvailability
inputThe prompt or query sent to the model.Accuracy, Security & Safety, and Behavior
actual_outputThe actual output generated by the model.Accuracy, Security & Safety, and Behavior
expected_outputThe ideal answer for the given input.Accuracy and Security & Safety
contextAdditional background information provided to the model alongside the input.All metrics
retrieval_contextThe context retrieved by your RAG system before sending the user query to your LLM.Accuracy, Security & Safety, and Behavior
tracesExecution traces from the agent, including tool calls, LLM invocations, and other internal operations.All metrics
expected_toolsList of tools expected to be used by the agent to accomplish the task.All metrics
tools_usedList of tools actually used by the agent during execution (automatically inferred from traces).All metrics
product_descriptionThe description of the product.All metrics
product_capabilitiesThe capabilities of the product.All metrics
product_inabilitiesThe product’s known inabilities or restrictions.All metrics
product_security_boundariesThe security boundaries of the product.All metrics
user_personaInformation about the user interacting with the agent.Behavior tests
goalThe user’s objective in the conversation.Behavior tests
scenarioThe context or situation for the conversation.Behavior tests
stopping_criteriasList of criteria that define when a conversation should end.Behavior tests
conversation_turnsAll turns in a conversation, including user and assistant messages.Behavior tests (Human Evaluation only)

Evaluation Types

Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.

Metrics Overview

Browse all available metrics and create custom ones.