Skip to main content
The User Objective Accomplished metric is one of several non-deterministic Metrics Galtea uses to evaluate whether a conversation led to the user’s intended goal being fulfilled. Unlike satisfaction-based measures, this metric centers on objective correctness—whether the agent actually met the user’s stated objective. Because LLMs often produce responses that sound correct even when they are not, this metric emphasizes verifying factual and goal-aligned outcomes, optionally using an expected_output when available. This ensures that evaluation is not based solely on conversational flow, but on actual task completion. This metric is particularly useful for use cases where accuracy and goal fulfillment matter more than tone or fluency, such as customer support resolutions, fact-based Q&A, or task execution scenarios.

Evaluation Parameters

To compute the user_objective_accomplished metric, the following parameters are required:
  • goal: The stated objective or intent of the user.
  • conversation_turns: The complete history of user inputs and chatbot responses.
Additionally, an optional parameter can be included:
  • expected_output: A ground-truth answer that can be used to verify correctness. If not provided, evaluation is based solely on whether the conversation indicates the user’s objective was achieved.

How Is It Calculated?

The user_objective_accomplished score is derived using an LLM-as-a-judge approach with strict correctness criteria:
  1. Goal Identification: Determine the user’s stated objective.
  2. Agent Response Evaluation: Analyze how the agent attempted to fulfill the goal.
  3. Correctness Check: If an expected_output is provided, confirm that the agent’s response aligns exactly with it. If not, rely on the conversational outcome to judge whether the user’s objective was achieved.
Based on this process, the LLM assigns a binary score:
  • 1 (Accomplished): The agent successfully and correctly fulfilled the user’s objective.
  • 0 (Not Accomplished): The agent failed to fulfill the user’s objective.

I