Skip to main content
The Contextual Relevancy metric is one of several non-deterministic Metrics Galtea uses to evaluate the performance of your AI products, specifically for Retrieval-Augmented Generation (RAG) systems. It assesses whether the retrieved context is pertinent to the user’s query. This metric helps ensure that the information provided to the generator component of your RAG pipeline is useful and on-topic, which is crucial for generating high-quality, relevant answers.

Evaluation Parameters

To compute the contextual_relevancy metric, the following parameters are required:
  • input: The user’s query or instruction.
  • actual_output: The response generated by your LLM application. (While not directly scored in this metric, it’s often part of the test case data).
  • retrieval_context: A list of documents or text chunks retrieved by your RAG system in response to the input.

How Is It Calculated?

The contextual_relevancy score is determined using an LLM-as-a-judge approach that measures the signal-to-noise ratio:
  1. Statement Extraction: The LLM analyzes the retrieval_context and identifies the high-level statements or facts presented.
  2. Relevance Check: For each statement, the LLM determines if it is relevant (provides useful information) or irrelevant (noise) to addressing the input.
  3. Score Calculation: The LLM evaluates whether the context contains significant irrelevant statements or is completely unrelated to the input.
The LLM assigns a binary score:
  • Score 1.0 (Relevant): All (or the vast majority of) statements are relevant and focused. The retrieval context is highly focused without significant distraction.
  • Score 0.0 (Noisy/Irrelevant): The context contains significant irrelevant statements (noise) or is completely unrelated to the input.

Suggested Test Case Types

The Contextual Relevancy metric is effective for evaluating quality test cases in Galtea for products that use RAG, since it measures whether the retrieved context is pertinent to the user’s query.