Conversation Relevancy
Checks if your product consistently responds in a contextually relevant way during a multi-turn conversation.
The Conversation Relevancy metric is one of several non-deterministic Metric Types Galtea uses to evaluate your LLM-based chatbot’s ability to generate contextually appropriate and relevant responses across the course of a multi-turn dialogue. It assesses whether each response aligns with the user’s intent, prior inputs, and the evolving context of the conversation.
This metric is particularly useful for ensuring coherent, on-topic conversations that maintain engagement and avoid misunderstandings or irrelevant diversions.
Evaluation Parameters
To compute the conversation_relevancy
metric, the following parameters are required:
input
: The most recent user message in the conversation.actual_output
: The chatbot’s corresponding response.conversational_turns
: The complete history of the conversation up to the current exchange.
These inputs enable the metric to evaluate the relevance of the response in context, not just in isolation.
How Is It Calculated?
The conversation_relevancy
score is derived using an LLM-as-a-judge approach:
- Contextual Analysis: An LLM is used to analyze the full
conversational_turns
, including the currentinput
andactual_output
. - Relevancy Judgment: The LLM determines whether the
actual_output
directly addresses the user’s intent and fits naturally within the flow of the dialogue.
The score is based on the ratio of contextually relevant responses to the total number of evaluated responses:
Scores range from 0 (entirely off-topic) to 1 (fully relevant throughout), helping you monitor and improve conversational quality at scale.