Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is Human Evaluation?

Human Evaluation lets your team manually review and score AI outputs. Instead of an LLM judge, evaluations enter a PENDING_HUMAN status and wait for a human annotator to submit a score through the dashboard. It is especially useful when you need subjective judgment, domain expertise, or a human-in-the-loop quality gate that automated scoring cannot provide.

How It Works

  1. Create a user group — organize the team members who will review outputs
  2. Create a Human Evaluation metric — route evaluations to your group instead of an LLM judge
  3. Run evaluations — evaluations enter PENDING_HUMAN status
  4. Annotate in the dashboard — group members review outputs and submit scores

Step 1: Create a User Group

User Groups organize evaluators and control who can score evaluations for specific metrics.
user_group = galtea.user_groups.create(
    name="quality-reviewers-" + run_identifier,
    description="Team responsible for reviewing output quality",
)
You can also create groups from the dashboard: navigate to your organization’s Groups tab.

Step 2: Add Users to the Group

Link users by their user IDs. Users in the group will see pending evaluations for linked metrics in their Human Evaluations dashboard page.
galtea.user_groups.link_users(
    user_group_id=user_group_id,
    user_ids=[user_id_1],
)
You can find user IDs in the dashboard under Settings > Members, or by using the organization members API.

Step 3: Create a Human Evaluation Metric

Create a metric with the Human Evaluation type and link it to your user group. When evaluations run against this metric, they enter PENDING_HUMAN status and appear in the linked group members’ dashboards.

Option A: From the Dashboard

Go to the Metrics creation form and configure:
  • Evaluation Type — Select Human Evaluation
  • User Groups — Assign one or more user groups
  • Evaluation Parameters — Choose which parameters (Input, Expected Output, Actual Output, etc.) annotators will see
  • Evaluation Guidelines — Write the criteria annotators should follow when scoring

Option B: From the SDK

metric = galtea.metrics.create(
    name="domain-expert-review-" + run_identifier,
    source="human_evaluation",
    judge_prompt="Review the assistant's response for accuracy and helpfulness. Score 1 if the response is correct and useful, 0 if it contains errors or is unhelpful.",
    evaluation_params=["input", "actual_output", "expected_output"],
    user_group_ids=[user_group_id],
    description="Domain expert review of response quality",
)

# Link the metric to the user group
galtea.user_groups.link_metrics(
    user_group_id=user_group_id,
    metric_ids=[metric.id],
)

Step 4: Run Evaluations

Run evaluations using the SDK or from the dashboard just like any other metric. The only difference is that each evaluation will enter PENDING_HUMAN status instead of being scored by an LLM. See Run Test-Based Evaluations, Evaluating Conversations, or Direct Inferences and Evaluations from the Platform for step-by-step instructions.

Step 5: Annotate in the Dashboard

Navigate to the Human Evaluations page in the sidebar. Click Start Evaluating to open the annotation dialog. For each evaluation, review the conversation and context, submit a score (0–100, normalized to 0–1), and optionally add a reason.

Managing Groups

Update a Group

user_group = galtea.user_groups.update(
    user_group_id=user_group_id,
    name="senior-quality-reviewers-" + run_identifier,
    description="Senior team for quality reviews",
)

Remove Users or Metrics

galtea.user_groups.unlink_users(
    user_group_id=user_group_id,
    user_ids=[user_id_1],
)
galtea.user_groups.unlink_metrics(
    user_group_id=user_group_id,
    metric_ids=[metric_id_2],
)

Evaluation Types

Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.

User Group Concept

Learn more about user groups and their properties.