The Answer Relevancy metric is one of several non-deterministic Metric Types Galtea uses to evaluate the quality of your RAG pipeline’s generator by measuring how well the actual_output addresses the user’s original query (input). It helps determine whether the model is generating responses that are focused, appropriate, and directly useful in the context of the question.

This is essential for ensuring the model doesn’t drift into unrelated topics or provide verbose but irrelevant information.


Evaluation Parameters

To compute the answer_relevancy metric, the following parameters are required:

  • input: The user’s query or instruction.
  • actual_output: The response generated by your LLM application.

This metric does not directly evaluate the retrieval context, focusing purely on the alignment between the user’s request and the model’s generated answer.


How Is It Calculated?

The score is computed via an LLM-as-a-judge process:

  1. Intent Extraction: An LLM identifies the core informational need in the input.
  2. Relevancy Judgment: The LLM evaluates whether the actual_output appropriately and directly addresses that need.

The final score is calculated as:

Answer Relevancy=Number of relevant outputsTotal number of evaluated responses\text{Answer Relevancy} = \frac{\text{Number of relevant outputs}}{\text{Total number of evaluated responses}}

Higher scores indicate that the generator is effectively aligned with user queries.

This metric was incorporated to the Galtea platform from the open source library deepeval, for more information you can also visit their documentation.