The Contextual Relevancy metric is one of several non-deterministic Metric Galtea uses to evaluate the performance of your AI products, specifically for Retrieval-Augmented Generation (RAG) systems. It assesses whether the retrieved context is pertinent to the user’s query. This metric helps ensure that the information provided to the generator component of your RAG pipeline is useful and on-topic, which is crucial for generating high-quality, relevant answers.Documentation Index
Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Evaluation Parameters
To compute thecontextual_relevancy metric, the following parameters are required:
input: The user’s query or instruction.actual_output: The response generated by your LLM application. (While not directly scored in this metric, it’s often part of the test case data).retrieval_context: A list of documents or text chunks retrieved by your RAG system in response to theinput.
How Is It Calculated?
Thecontextual_relevancy score is determined using an LLM-as-a-judge approach that measures the signal-to-noise ratio:
- Statement Extraction: The LLM analyzes the
retrieval_contextand identifies the high-level statements or facts presented. - Relevance Check: For each statement, the LLM determines if it is relevant (provides useful information) or irrelevant (noise) to addressing the
input. - Score Calculation: The LLM evaluates whether the context contains significant irrelevant statements or is completely unrelated to the input.
- Score 1.0 (Relevant): All (or the vast majority of) statements are relevant and focused. The retrieval context is highly focused without significant distraction.
- Score 0.0 (Noisy/Irrelevant): The context contains significant irrelevant statements (noise) or is completely unrelated to the input.