Create Metric - Galtea Docs

Returns

Returns a Metric object for the given parameters, or None if an error occurs.

Examples

Full Prompt (LLM-as-a-Judge)
Partial Prompt (LLM-as-a-Judge)
Self Hosted

metric = galtea.metrics.create(
    name="accuracy_v1",
    test_type="QUALITY",
    evaluator_model_name="GPT-4.1",
    source="full_prompt",
    judge_prompt="Determine whether the output is equivalent to the expected output. Output: \"{actual_output}\". Expected Output: \"{expected_output}.\"",
    tags=["custom", "accuracy"],
    description="A custom accuracy metric."
)

Parameters

name

string

required

The name of the metric.

test_type

string

required

The type of test this metric is designed for. Possible values: QUALITY, RED_TEAMING, SCENARIOS.

evaluator_model_name

string

The name of the model used to evaluate the metric. Required for metrics using judge_prompt.Available models:

"Claude-Sonnet-4.O"
"Claude-Sonnet-3.7"
"GPT-4.1-mini"
"Gemini-2.5-Flash-Lite"
"Gemini-2.5-Flash"
"Gemini-2.0-flash"
"GPT-4o"
"GPT-4.1"

It should not be provided if the metric is “self hosted” (has no judge_prompt) since it does not require a model for evaluation.

judge_prompt

string

A custom prompt that defines the evaluation logic for an LLM-as-a-judge metric. You can use placeholders like {input}, {actual_output}, etc., which will be populated at evaluation time. If you provide a judge_prompt, the metric will be an LLM-based evaluation. If omitted, the metric is considered a deterministic “Custom Score” metric.

source

string

The source of the metric. Possible values are: full_prompt, partial_prompt or self_hosted.

Full Prompt (LLM-as-a-Judge): Gives you maximum control by providing a complete judge prompt template with placeholders (e.g., {input}, {actual_output}). Galtea populates the template and uses an LLM to evaluate based on your exact instructions.
Partial Prompt (LLM-as-a-Judge): Simplifies prompt creation by providing only the core evaluation criteria or rubric. Galtea dynamically constructs the final prompt by prepending selected evaluation parameters to your criteria.
Self Hosted: For deterministic metrics scored locally using the SDK’s CustomScoreEvaluationMetric. Your custom logic runs on your infrastructure, and the resulting score is uploaded to the platform.

This parameter is optional for now but may be required in the future to avoid confusion.

evaluation_params

list[string]

Evaluation parameters to be used when the metric is a Partial Prompt (LLM-as-a-Judge). These parameters will be prepended to the judge prompt to construct the final prompt. To check the available evaluation parameters, see the Evaluation Parameters section.

It should not be provided if the metric is NOT a Partial Prompt (LLM-as-a-Judge).

SDK

API

​Returns

​Examples

​Parameters

Returns

Examples

Parameters