Evaluating conversations
Learn how to evaluate conversations with multiple turns using the Galtea SDK
To accurately evaluate interactions within a dialogue, certain metrics need access to the preceding turns of the conversation. This is handled by the conversation_turns
parameter.
Currently, this parameter can only be used with the following metrics, which are available by default:
- Role Adherence: Measures how well the actual output adheres to a specified role.
- Knowledge Retention: Assesses the model’s ability to retain and use information from previous turns in the conversation.
- Conversational Completeness: Evaluates whether the conversation has reached a natural and informative conclusion.
- Conversation Relevancy: Assesses whether each turn in the conversation is relevant to the ongoing topic and user needs.
The conversation_turns
parameter is available both in the create evaluation task and create evaluation task from production methods.
Expected format
The conversation_turns
parameter expects a list of dictionaries. Each dictionary represents a single, complete exchange (one user query and the assistant’s response to it) that occurred before the current interaction being sent to our platform.
The required structure for each dictionary in the list is:
This is how it would look like in practice:
Converting Your Application’s History Format
Let’s say you have a conversation history like this.
Then, this would be a valid method to extract the conversation turns from the conversation history: