Documentation Index
Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt
Use this file to discover all available pages before exploring further.
What is Human Evaluation?
Human Evaluation lets your team manually review and score AI outputs. Instead of an LLM judge, evaluations enter aPENDING_HUMAN status and wait for a human annotator to submit a score through the dashboard.
It is especially useful when you need subjective judgment, domain expertise, or a human-in-the-loop quality gate that automated scoring cannot provide.
How It Works
- Create a user group — organize the team members who will review outputs
- Create a Human Evaluation metric — route evaluations to your group instead of an LLM judge
- Run evaluations — evaluations enter
PENDING_HUMANstatus - Annotate in the dashboard — group members review outputs and submit scores
Step 1: Create a User Group
User Groups organize evaluators and control who can score evaluations for specific metrics.Step 2: Add Users to the Group
Link users by their user IDs. Users in the group will see pending evaluations for linked metrics in their Human Evaluations dashboard page.Step 3: Create a Human Evaluation Metric
Create a metric with the Human Evaluation type and link it to your user group. When evaluations run against this metric, they enterPENDING_HUMAN status and appear in the linked group members’ dashboards.
Option A: From the Dashboard
Go to the Metrics creation form and configure:- Evaluation Type — Select Human Evaluation
- User Groups — Assign one or more user groups
- Evaluation Parameters — Choose which parameters (Input, Expected Output, Actual Output, etc.) annotators will see
- Evaluation Guidelines — Write the criteria annotators should follow when scoring
Option B: From the SDK
Step 4: Run Evaluations
Run evaluations using the SDK or from the dashboard just like any other metric. The only difference is that each evaluation will enterPENDING_HUMAN status instead of being scored by an LLM.
See Run Test-Based Evaluations, Evaluating Conversations, or Direct Inferences and Evaluations from the Platform for step-by-step instructions.
Step 5: Annotate in the Dashboard
Navigate to the Human Evaluations page in the sidebar. Click Start Evaluating to open the annotation dialog. For each evaluation, review the conversation and context, submit a score (0–100, normalized to 0–1), and optionally add a reason.Managing Groups
Update a Group
Remove Users or Metrics
Related
Evaluation Types
Understand AI Evaluation, Human Evaluation, and Self-Hosted scoring.
User Group Concept
Learn more about user groups and their properties.