Evaluation Parameters
To compute therole_adherence metric, the following inputs are required in every turn of the conversation:
input: The current user message.actual_output: The corresponding chatbot response.
How Is It Calculated?
Therole_adherence score is computed using an LLM-as-a-judge approach:
- Define the Persona: Based on the
product_description, the LLM identifies the expected persona, tone, professional boundaries, and style. - Audit the Conversation: The LLM reviews every response from the agent in the conversation history.
- Check for Deviations: The LLM evaluates whether the agent broke character, violated tone constraints, strayed from its designated responsibilities, or suddenly deviated from the inferred role.
- Score 1.0 (Adherent): The agent consistently maintained its role, tone, and persona throughout all turns.
- Score 0.0 (Non-Adherent): The agent deviated from its role, broke character, or adopted an inconsistent tone at any point.