Overview
The specification-driven flow works like this:- Define specifications — describe what your product should do, cannot do, and must follow
- Generate or link metrics — AI generates judge prompts from your specs, or you link existing metrics
- Create tests from specs — test type is auto-derived from the specification
For running evaluations using specifications, see Specification-Based Evaluation.
Prerequisites
- A product with a description (created via dashboard or SDK)
- A version to evaluate
- The Galtea SDK installed and configured
Step 1: Define Specifications
Specifications represent testable behavioral expectations. There are three types:- Capability — what the product can do (e.g., “Can explain investment concepts”)
- Inability — what the product cannot do due to hard technical limits (e.g., “Cannot execute transactions”)
- Policy — rules the product must follow (e.g., “Must refuse personalized investment advice”)
Policy specifications require a
test_type (ACCURACY, SECURITY, or BEHAVIOR) that determines how the spec is evaluated. Capability and Inability specs do not need a test type.Step 2: Generate or Link Metrics
Metrics define how each specification is scored. You have two options:Option A: AI-Generated Metrics (Recommended)
From the dashboard, navigate to your product’s Specifications tab, open the dropdown on a specification, and click Generate Metrics. The AI creates judge prompts and evaluation parameters tailored to each spec. See AI Metric Generation for the full workflow.Option B: Manual Metric Creation and Linking
Create a metric with a custom judge prompt and link it to a specification:Step 3: Create Tests from Specifications
Tests can be created from specifications in two ways:Option A: AI-Generated Test Configurations (Dashboard)
From the dashboard, navigate to your product’s Tests tab and click Generate with AI. Select the Policy specifications you want to generate tests for, and the system will suggest test configurations — including name, type, variants, strategies, and max test cases — all auto-derived from your specifications. Review each candidate, edit if needed, and save.AI test generation is available for Policy specifications with a
SECURITY or BEHAVIOR test type. The system uses the specification’s description as context — for Security tests it becomes the custom_variant_description, and for Behavior tests it shapes the scenario generation.Option B: SDK — Create Test with Specification ID
Passspecification_id instead of type — the test type and variant are auto-derived:
Next Steps
Run Specification-Based Evaluations
Run evaluations using your specifications and their linked metrics.
AI Metric Generation
Automatically generate metrics from your specifications using AI.