Skip to main content
The Answer Relevancy metric is one of several non-deterministic Metrics Galtea uses to evaluate the quality of your model outputs by measuring how well the actual_output addresses the user’s original query (input). It helps determine whether the model is generating responses that are focused, appropriate, and directly useful in the context of the question. This is essential for ensuring the model doesn’t drift into unrelated topics or provide verbose but irrelevant information.

Evaluation Parameters

To compute the answer_relevancy metric, the following parameters are required:
  • input: The user’s query or instruction.
  • actual_output: The response generated by your LLM application.
This metric does not directly evaluate the retrieval context, focusing purely on the alignment between the user’s request and the model’s generated answer.

How Is It Calculated?

The answer_relevancy score is derived using an LLM-as-a-judge approach with explicit pass criteria:
  1. Intent Alignment: Does the actual_output directly address the core informational need expressed in the input?
  2. Relevancy Check: Is the response focused and on-topic, without drifting into unrelated information?
Based on these criteria, the LLM assigns a binary score:
  • 1 (Relevant): The response directly and appropriately addresses the user’s query.
  • 0 (Not Relevant): The response fails to address the query, is off-topic, or provides irrelevant information.

Suggested Test Case Types

The Answer Relevancy metric is effective for evaluating quality test cases in Galtea, since it evaluates the model’s ability to maintain relevance to user queries.