Contextual Relevancy
Measures the quality of your RAG pipeline’s retriever by evaluating the overall relevance of the information presented in your retrieval_context for a given input.
The Contextual Relevancy metric is one of several non-deterministic Metric Types Galtea uses to evaluate the performance of your AI products, specifically for Retrieval-Augmented Generation (RAG) systems. It assesses whether the retrieved context is pertinent to the user’s query.
This metric helps ensure that the information provided to the generator component of your RAG pipeline is useful and on-topic, which is crucial for generating high-quality, relevant answers.
Evaluation Parameters
To compute the contextual_relevancy
metric, the following parameters are required:
input
: The user’s query or instruction.actual_output
: The response generated by your LLM application. (While not directly scored in this metric, it’s often part of the test case data).retrieval_context
: A list of documents or text chunks retrieved by your RAG system in response to theinput
.
How Is It Calculated?
This metric’s score is computed using an LLM-as-a-judge process. The LLM judge performs the following steps:
- Statement Extraction: The LLM processes each document/node within the
retrieval_context
to identify individual statements or key pieces of information. - Relevance Classification: For each extracted statement from the
retrieval_context
, the LLM determines if that statement is relevant to the originalinput
. - Score Calculation: The final score is the ratio of relevant statements to the total number of statements extracted from the
retrieval_context
.
The formula is:
A higher score indicates that the retriever is effectively sourcing information that is pertinent to the user’s query.