Returns
Returns a Metric object for the given parameters, orNone if an error occurs.
Examples
- Full Prompt (LLM-as-a-Judge)
- Partial Prompt (LLM-as-a-Judge)
- Self Hosted
Parameters
The name of the metric.
The type of test this metric is designed for.
Possible values:
QUALITY, RED_TEAMING, SCENARIOS.The name of the model used to evaluate the metric. Required for metrics using
judge_prompt.Available models:"Claude-Sonnet-4.O""Claude-Sonnet-3.7""GPT-4.1-mini""Gemini-2.5-Flash-Lite""Gemini-2.5-Flash""Gemini-2.0-flash""GPT-4o""GPT-4.1"
It should not be provided if the metric is “self hosted” (has no
judge_prompt) since it does not require a model for evaluation.A custom prompt that defines the evaluation logic for an LLM-as-a-judge metric. You can use placeholders like
{input}, {actual_output}, etc., which will be populated at evaluation time. If you provide a judge_prompt, the metric will be an LLM-based evaluation. If omitted, the metric is considered a deterministic “Custom Score” metric.The source of the metric. Possible values are:
full_prompt, partial_prompt or self_hosted.- Full Prompt (LLM-as-a-Judge): Gives you maximum control by providing a complete judge prompt template with placeholders (e.g.,
{input},{actual_output}). Galtea populates the template and uses an LLM to evaluate based on your exact instructions. - Partial Prompt (LLM-as-a-Judge): Simplifies prompt creation by providing only the core evaluation criteria or rubric. Galtea dynamically constructs the final prompt by prepending selected evaluation parameters to your criteria.
- Self Hosted: For deterministic metrics scored locally using the SDK’s
CustomScoreEvaluationMetric. Your custom logic runs on your infrastructure, and the resulting score is uploaded to the platform.
This parameter is optional for now but may be required in the future to avoid confusion.
Evaluation parameters to be used when the metric is a Partial Prompt (LLM-as-a-Judge). These parameters will be prepended to the judge prompt to construct the final prompt.
To check the available evaluation parameters, see the Evaluation Parameters section.
It should not be provided if the metric is NOT a Partial Prompt (LLM-as-a-Judge).
Tags to categorize the metric.
A brief description of what the metric evaluates.
A URL pointing to more detailed documentation about the metric.