Knowledge Retention

The Knowledge Retention metric is one of several non-deterministic Metrics Galtea uses to evaluate your LLM-based chatbot’s ability to retain and consistently apply factual information shared earlier in a conversation. It analyzes the entire conversational history to determine whether the model recalls and reuses relevant facts when generating new responses. This is particularly useful for long, multi-turn dialogues where context accumulation and memory play a crucial role in the user experience.

Evaluation Parameters

To compute the knowledge_retention metric, the following parameters are required in every turn of the conversation:

input: The user message in the conversation.
actual_output: The LLM-generated response to the user message.

This metric will evaluate the whole conversation, including all turns, to simulate a memory-check process across multiple turns.

How Is It Calculated?

The knowledge_retention score is computed using an LLM-as-a-judge approach:

Identify Knowledge Anchors: The LLM scans user inputs to identify specific facts, preferences, constraints, or context (e.g., names, locations, specific numbers).
Verify Recall: The LLM checks if the agent recalled and applied this information in subsequent turns.
Check Consistency: The LLM evaluates whether the agent contradicted previously established information, asked for information already provided, or ignored constraints set earlier.

The metric assigns a binary score:

Score 1.0 (Good Retention): The agent correctly recalled relevant information or no specific memory recall was required (and no errors were made).
Score 0.0 (Poor Retention): The agent forgot information, contradicted itself, or asked redundant questions about known facts.

SDK

Concepts

Evaluation Parameters

How Is It Calculated?

SDK

Concepts

​Evaluation Parameters

​How Is It Calculated?

​Related Topics

Evaluation Parameters

How Is It Calculated?

Related Topics