Skip to main content

Returns

Returns a Metric object for the given parameters, or None if an error occurs.

Examples

  • Full Prompt (LLM-as-a-Judge)
  • Partial Prompt (LLM-as-a-Judge)
  • Self Hosted
metric = galtea.metrics.create(
    name="accuracy_v1",
    test_type="QUALITY",
    evaluator_model_name="GPT-4.1",
    source="full_prompt",
    judge_prompt="Determine whether the output is equivalent to the expected output. Output: \"{actual_output}\". Expected Output: \"{expected_output}.\"",
    tags=["custom", "accuracy"],
    description="A custom accuracy metric."
)

Parameters

name
string
required
The name of the metric.
test_type
string
required
The type of test this metric is designed for. Possible values: QUALITY, RED_TEAMING, SCENARIOS.
evaluator_model_name
string
The name of the model used to evaluate the metric. Required for metrics using judge_prompt.Available models:
  • "Claude-Sonnet-4.O"
  • "Claude-Sonnet-3.7"
  • "GPT-4.1-mini"
  • "Gemini-2.5-Flash-Lite"
  • "Gemini-2.5-Flash"
  • "Gemini-2.0-flash"
  • "GPT-4o"
  • "GPT-4.1"
It should not be provided if the metric is “self hosted” (has no judge_prompt) since it does not require a model for evaluation.
judge_prompt
string
A custom prompt that defines the evaluation logic for an LLM-as-a-judge metric. You can use placeholders like {input}, {actual_output}, etc., which will be populated at evaluation time. If you provide a judge_prompt, the metric will be an LLM-based evaluation. If omitted, the metric is considered a deterministic “Custom Score” metric.
source
string
The source of the metric. Possible values are: full_prompt, partial_prompt or self_hosted.
  • Full Prompt (LLM-as-a-Judge): Gives you maximum control by providing a complete judge prompt template with placeholders (e.g., {input}, {actual_output}). Galtea populates the template and uses an LLM to evaluate based on your exact instructions.
  • Partial Prompt (LLM-as-a-Judge): Simplifies prompt creation by providing only the core evaluation criteria or rubric. Galtea dynamically constructs the final prompt by prepending selected evaluation parameters to your criteria.
  • Self Hosted: For deterministic metrics scored locally using the SDK’s CustomScoreEvaluationMetric. Your custom logic runs on your infrastructure, and the resulting score is uploaded to the platform.
This parameter is optional for now but may be required in the future to avoid confusion.
evaluation_params
list[string]
Evaluation parameters to be used when the metric is a Partial Prompt (LLM-as-a-Judge). These parameters will be prepended to the judge prompt to construct the final prompt. To check the available evaluation parameters, see the Evaluation Parameters section.
It should not be provided if the metric is NOT a Partial Prompt (LLM-as-a-Judge).
tags
list[string]
Tags to categorize the metric.
description
string
A brief description of what the metric evaluates.
documentation_url
string
A URL pointing to more detailed documentation about the metric.