Evaluation Parameters
To compute theuser_objective_accomplished metric, the following parameters are required:
input: The user messages sent to the chatbot.actual_output: The chatbot’s corresponding responses.goal: The stated objective or intent of the user.
How Is It Calculated?
Theuser_objective_accomplished score is derived using an LLM-as-a-judge approach with strict correctness criteria and a chain-of-thought style evaluation:
- Goal Identification: Determine the user’s stated objective.
- Agent Response Evaluation: Analyze how the agent attempted to fulfill the goal across the conversation and examine the final
actual_output. - Correctness Check: Judge whether the final
actual_outputcorrectly, completely, and directly fulfills the user’s statedgoal. Identify any factual errors, omissions, or misunderstandings that prevent accomplishment.
- 1 (Accomplished): The agent successfully and correctly fulfilled the user’s objective.
- 0 (Not Accomplished): The agent failed to fulfill the user’s objective.