The Contextual Relevancy metric is one of several non-deterministic Metric Types Galtea uses to evaluate the performance of your AI products, specifically for Retrieval-Augmented Generation (RAG) systems. It assesses whether the retrieved context is pertinent to the user’s query.

This metric helps ensure that the information provided to the generator component of your RAG pipeline is useful and on-topic, which is crucial for generating high-quality, relevant answers.


Evaluation Parameters

To compute the contextual_relevancy metric, the following parameters are required:

  • input: The user’s query or instruction.
  • actual_output: The response generated by your LLM application. (While not directly scored in this metric, it’s often part of the test case data).
  • retrieval_context: A list of documents or text chunks retrieved by your RAG system in response to the input.

How Is It Calculated?

This metric’s score is computed using an LLM-as-a-judge process. The LLM judge performs the following steps:

  1. Statement Extraction: The LLM processes each document/node within the retrieval_context to identify individual statements or key pieces of information.
  2. Relevance Classification: For each extracted statement from the retrieval_context, the LLM determines if that statement is relevant to the original input.
  3. Score Calculation: The final score is the ratio of relevant statements to the total number of statements extracted from the retrieval_context.

The formula is:

Contextual Relevancy=Number of relevant statementsTotal number of statements\text{Contextual Relevancy} = \frac{\text{Number of relevant statements}}{\text{Total number of statements}}

A higher score indicates that the retriever is effectively sourcing information that is pertinent to the user’s query.

This metric was incorporated to the Galtea platform from the open source library deepeval, for more information you can also visit their documentation.