- For AI Evaluation, the selected parameters are automatically prepended to your
judge_prompt. - For Human Evaluation, they determine which data fields are displayed to annotators.
- For Self-Hosted metrics, evaluation parameters do not apply.
Parameter Reference
| Parameter | Description | Availability |
|---|---|---|
| input | The prompt or query sent to the model. | Accuracy, Security & Safety, and Behavior |
| actual_output | The actual output generated by the model. | Accuracy, Security & Safety, and Behavior |
| expected_output | The ideal answer for the given input. | Accuracy and Security & Safety |
| context | Additional background information provided to the model alongside the input. | All metrics |
| retrieval_context | The context retrieved by your RAG system before sending the user query to your LLM. | Accuracy, Security & Safety, and Behavior |
| traces | Execution traces from the agent, including tool calls, LLM invocations, and other internal operations. | All metrics |
| expected_tools | List of tools expected to be used by the agent to accomplish the task. | All metrics |
| tools_used | List of tools actually used by the agent during execution (automatically inferred from traces). | All metrics |
| product_description | The description of the product. | All metrics |
| product_capabilities | The capabilities of the product. | All metrics |
| product_inabilities | The product’s known inabilities or restrictions. | All metrics |
| product_security_boundaries | The security boundaries of the product. | All metrics |
| user_persona | Information about the user interacting with the agent. | Behavior tests |
| goal | The user’s objective in the conversation. | Behavior tests |
| scenario | The context or situation for the conversation. | Behavior tests |
| stopping_criterias | List of criteria that define when a conversation should end. | Behavior tests |
| conversation_turns | All turns in a conversation, including user and assistant messages. | Behavior tests (Human Evaluation only) |
Related
Evaluation Types
Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.
Metrics Overview
Browse all available metrics and create custom ones.