AI Metric Generation

Overview

AI Metric Generation lets you automatically create evaluation metrics from your product’s specifications. Instead of manually crafting judge prompts and configuring evaluation parameters, the AI analyzes your specifications and generates ready-to-use metrics.

Evaluation parameters are automatically selected based on each specification’s description and test type. The generated judge prompt follows a format optimized for reliable LLM-based evaluation across different evaluator models.

Requirements

A product with a description
At least one specification of type POLICY with a test type assigned (Accuracy, Security, or Behavior)

CAPABILITY and INABILITY specifications cannot be used for AI metric generation because they do not have a test type.

How to Generate Metrics

There are two ways to trigger AI metric generation from the dashboard:

From the Specifications Page

Navigate to your product’s Specifications tab
Open the dropdown menu on the specification you want to generate metrics for
Click Generate Metrics — this takes you to the generation page with that specification pre-selected
Click Generate and wait for the AI to process
Review the generated candidates — edit, save, or discard each one

From the Metrics Page

Navigate to your product’s Metrics tab
Click Generate Metrics with AI
Select the specifications you want to generate metrics for
Click Generate and wait for the AI to process
Review the generated candidates — edit, save, or discard each one

In both cases, the AI analyzes your product name, description, and the selected specifications to generate tailored metrics.

Evaluation Parameter Selection

The AI automatically selects the evaluation parameters each metric needs based on what the specification describes. For example:

A specification about citation accuracy or knowledge-grounded answers will include retrieval_context so the judge can verify answers against retrieved source material.
A specification about internal processes, workflows, or tool orchestration will include traces (and often tools_used) so the judge can inspect the execution path — not just the final output.
A specification about refusal or safety boundaries typically only needs input and actual_output, since compliance is fully observable from what was asked and answered.

If a metric includes traces or retrieval_context as evaluation parameters, the product under test must capture that data during test execution (e.g., via tracing integrations). If the data is not available at evaluation time, the evaluation will fail.

Generated Metric Properties

Each AI-generated metric candidate includes:

Property	Description
Name	A descriptive name for the metric
Description	What the metric evaluates
Judge Prompt	The evaluation prompt with placeholders for dynamic data
Evaluation Parameters	The data parameters the judge needs for evaluation (automatically selected based on the specification)
Tags	Categorization tags
Evaluator Model	The LLM model used for evaluation
Test Type	Inherited from the source specification

Specification Linking

When you save a generated metric, it is automatically linked to the specification it was generated from. This creates a traceable connection between your requirements and your evaluation criteria. You can view linked specifications directly from a metric’s detail page, and manage metric-specification links from the Specification Hub.

SDK

Concepts

Overview

Requirements

How to Generate Metrics

From the Specifications Page

From the Metrics Page

Evaluation Parameter Selection

Generated Metric Properties

Specification Linking

SDK

Concepts

​Overview

​Requirements

​How to Generate Metrics

​From the Specifications Page

​From the Metrics Page

​Evaluation Parameter Selection

​Generated Metric Properties

​Specification Linking

Overview

Requirements

How to Generate Metrics

From the Specifications Page

From the Metrics Page

Evaluation Parameter Selection

Generated Metric Properties

Specification Linking