Evaluation Parameters
To compute theknowledge_retention metric, the following parameters are required in every turn of the conversation:
input: The user message in the conversation.actual_output: The LLM-generated response to the user message.
How Is It Calculated?
Theknowledge_retention score is computed using an LLM-as-a-judge approach:
- Identify Knowledge Anchors: The LLM scans user inputs to identify specific facts, preferences, constraints, or context (e.g., names, locations, specific numbers).
- Verify Recall: The LLM checks if the agent recalled and applied this information in subsequent turns.
- Check Consistency: The LLM evaluates whether the agent contradicted previously established information, asked for information already provided, or ignored constraints set earlier.
- Score 1.0 (Good Retention): The agent correctly recalled relevant information or no specific memory recall was required (and no errors were made).
- Score 0.0 (Poor Retention): The agent forgot information, contradicted itself, or asked redundant questions about known facts.