The User Objective Accomplished metric is one of several non-deterministic Metric Galtea uses to evaluate whether a conversation led to the user’s intended goal being fulfilled. Unlike satisfaction-based measures, this metric centers on objective correctness—whether the agent actually met the user’s stated objective. This metric is particularly useful for use cases where accuracy and goal fulfillment matter more than tone or fluency, such as customer support resolutions, fact-based Q&A, or task execution scenarios.Documentation Index
Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Evaluation Parameters
To compute theuser_objective_accomplished metric, the following parameters are required:
input: The user messages sent to the chatbot.actual_output: The chatbot’s corresponding responses.goal: The stated objective or intent of the user.
How Is It Calculated?
Theuser_objective_accomplished score is derived using an LLM-as-a-judge approach with strict correctness criteria and a chain-of-thought style evaluation:
- Goal Identification: Determine the user’s stated objective.
- Agent Response Evaluation: Analyze how the agent attempted to fulfill the goal across the conversation and examine the final
actual_output. - Correctness Check: Judge whether the final
actual_outputcorrectly, completely, and directly fulfills the user’s statedgoal. Identify any factual errors, omissions, or misunderstandings that prevent accomplishment.
- 1 (Accomplished): The agent successfully and correctly fulfilled the user’s objective.
- 0 (Not Accomplished): The agent failed to fulfill the user’s objective.
Suggested Test Case Types
The User Objective Accomplished metric is effective for evaluating Behavior test cases in Galtea, particularly:- Goal-driven conversations where the user has a specific, verifiable objective.
- Task execution scenarios such as booking, purchasing, or information retrieval.
- Support interactions where resolution of the user’s issue is the primary success criterion.