Answer Relevancy
Evaluates whether the generated answer is relevant to the user’s input in a RAG pipeline.
The Answer Relevancy metric is one of several non-deterministic Metric Types Galtea uses to evaluate the quality of your RAG pipeline’s generator by measuring how well the actual_output
addresses the user’s original query (input
). It helps determine whether the model is generating responses that are focused, appropriate, and directly useful in the context of the question.
This is essential for ensuring the model doesn’t drift into unrelated topics or provide verbose but irrelevant information.
Evaluation Parameters
To compute the answer_relevancy
metric, the following parameters are required:
input
: The user’s query or instruction.actual_output
: The response generated by your LLM application.
This metric does not directly evaluate the retrieval context, focusing purely on the alignment between the user’s request and the model’s generated answer.
How Is It Calculated?
The score is computed via an LLM-as-a-judge process:
- Intent Extraction: An LLM identifies the core informational need in the
input
. - Relevancy Judgment: The LLM evaluates whether the
actual_output
appropriately and directly addresses that need.
The final score is calculated as:
Higher scores indicate that the generator is effectively aligned with user queries.