Evaluation Parameters
To compute thecontextual_relevancy metric, the following parameters are required:
input: The user’s query or instruction.actual_output: The response generated by your LLM application. (While not directly scored in this metric, it’s often part of the test case data).retrieval_context: A list of documents or text chunks retrieved by your RAG system in response to theinput.
How Is It Calculated?
Thecontextual_relevancy score is determined using an LLM-as-a-judge approach that measures the signal-to-noise ratio:
- Statement Extraction: The LLM analyzes the
retrieval_contextand identifies the high-level statements or facts presented. - Relevance Check: For each statement, the LLM determines if it is relevant (provides useful information) or irrelevant (noise) to addressing the
input. - Score Calculation: The LLM evaluates whether the context contains significant irrelevant statements or is completely unrelated to the input.
- Score 1.0 (Relevant): All (or the vast majority of) statements are relevant and focused. The retrieval context is highly focused without significant distraction.
- Score 0.0 (Noisy/Irrelevant): The context contains significant irrelevant statements (noise) or is completely unrelated to the input.