expected_output
when available. This ensures that evaluation is not based solely on conversational flow, but on actual task completion.
This metric is particularly useful for use cases where accuracy and goal fulfillment matter more than tone or fluency, such as customer support resolutions, fact-based Q&A, or task execution scenarios.
Evaluation Parameters
To compute theuser_objective_accomplished
metric, the following parameters are required:
goal
: The stated objective or intent of the user.conversation_turns
: The complete history of user inputs and chatbot responses.
expected_output
: A ground-truth answer that can be used to verify correctness. If not provided, evaluation is based solely on whether the conversation indicates the user’s objective was achieved.
How Is It Calculated?
Theuser_objective_accomplished
score is derived using an LLM-as-a-judge approach with strict correctness criteria:
- Goal Identification: Determine the user’s stated objective.
- Agent Response Evaluation: Analyze how the agent attempted to fulfill the goal.
- Correctness Check: If an
expected_output
is provided, confirm that the agent’s response aligns exactly with it. If not, rely on the conversational outcome to judge whether the user’s objective was achieved.
- 1 (Accomplished): The agent successfully and correctly fulfilled the user’s objective.
- 0 (Not Accomplished): The agent failed to fulfill the user’s objective.