Skip to main content

Returns

Returns a Metric object for the given parameters, or None if an error occurs.

Examples

  • LLM-as-a-Judge
  • Self Hosted
metric = galtea.metrics.create(
    name="accuracy_v1",
    test_type="QUALITY",
    evaluator_model_name="GPT-4.1",
    judge_prompt="Determine whether the output is equivalent to the expected output. Output: \"{actual_output}\". Expected Output: \"{expected_output}.\"",
    tags=["custom", "accuracy"],
    description="A custom accuracy metric."
)

Parameters

name
string
required
The name of the metric.
test_type
string
required
The type of test this metric is designed for. Possible values: QUALITY, RED_TEAMING, SCENARIOS.
evaluator_model_name
string
The name of the model used to evaluate the metric. Required for metrics using judge_prompt.Available models:
  • "GPT-35-turbo"
  • "GPT-4o"
  • "GPT-4o-mini"
  • "GPT-4.1"
  • "Gemini-2.0-flash"
  • "Gemini-2.5-Flash"
  • "Gemini-2.5-Flash-Lite"
It should not be provided if the metric is “self hosted” (has no judge_prompt) since it does not require a model for evaluation.
judge_prompt
string
A custom prompt that defines the evaluation logic for an LLM-as-a-judge metric. You can use placeholders like {input}, {actual_output}, etc., which will be populated at evaluation time. If you provide a judge_prompt, the metric will be an LLM-based evaluation. If omitted, the metric is considered a deterministic “Custom Score” metric.
tags
list[string]
Tags to categorize the metric.
description
string
A brief description of what the metric evaluates.
documentation_url
string
A URL pointing to more detailed documentation about the metric.
I