Returns

Returns a MetricType object for the given parameters, or None if an error occurs.

Examples

metric = galtea.metrics.create(
    name="accuracy_v1",
    evaluator_model_name="GPT-4.1",
    criteria="Determine whether the actual output is equivalent to the expected output.",
    evaluation_params=["input", "expected_output", "actual_output"],
    tags=["custom", "accuracy"],
    description="A custom accuracy metric."
)

Parameters

name
string
required
The name of the metric.
evaluator_model_name
string
The name of the model used to evaluate the metric. This model will be used to assess the quality of the outputs based on the metric’s criteria.Available models:
  • "GPT-35-turbo"
  • "GPT-4o"
  • "GPT-4o-mini"
  • "GPT-4.1"
  • "Gemini-2.0-flash"
  • "Gemini-2.5-Flash"
  • "Gemini-2.5-Flash-Lite"
It should not be provided if the metric is “custom” (has no criteria nor evaluation_steps) since it does not require a model for evaluation.
evaluation_params
list[string]
A list of strings indicating which parameters are relevant for evaluating this metric.Supported values for G-Eval metrics:
  • "input": The original prompt or query sent to the model (always required).
  • "actual_output": The output generated by the model.
  • "expected_output": The ideal or reference answer for the input.
  • "context": Supplementary context or background information provided to the model.
  • "retrieval_context": Information retrieved by your RAG pipeline to support inference.
Additional values for Custom Judge metrics:
  • "product_description": High-level description of the product being evaluated.
  • "product_capabilities": Capabilities or intended functionalities of the product.
  • "product_inabilities": Known limitations or things the product cannot or should not do.
  • "product_security_boundaries": Specific boundaries or restrictions to ensure secure behavior.
criteria
string
required
The criteria for the metric.
evaluation_steps
list[string]
required
The evaluation steps for the metric.
You need to provide either Criteria or Evaluation Steps, but not both. Your choice depends on your preferred evaluation approach.
judge_prompt
string
The custom judge prompt for the metric.
tags
list[string]
The tags for the metric type so it can be easily identified and categorized.
description
string
A brief description of what the metric type evaluates.
documentation_url
string
A URL pointing to more detailed documentation about the metric type.