After creating evaluations (via create() or run()), they start in PENDING status while the evaluation engine processes them. Use wait_for() to block until all evaluations have completed.
An evaluation is considered complete when its status is anything other than PENDING:
SUCCESS, FAILED, SKIPPED, or PENDING_HUMAN.
Usage
By Evaluation IDs
By Job ID
Wait for specific evaluations you already have IDs for — typically from create() or from a run() with an agent.Returns evaluations in the same order as the input evaluation_ids.Create and wait:# Create evaluations and wait for them to complete
evaluations = galtea.evaluations.create(
session_id=session.id,
metrics=[{"name": "Non-Toxic"}, {"name": "Unbiased"}],
)
# Wait for all evaluations to leave PENDING status
completed = galtea.evaluations.wait_for(
evaluation_ids=[e.id for e in evaluations],
)
for evaluation in completed:
print(f"{evaluation.id}: {evaluation.status} — score: {evaluation.score}")
Custom timeout and poll interval:# Wait with a custom timeout and poll interval
completed = galtea.evaluations.wait_for(
evaluation_ids=[e.id for e in evaluations],
timeout=600, # wait up to 10 minutes
poll_interval=10, # check every 10 seconds
)
Full lifecycle — run() with agent, then wait_for():# Full lifecycle: run with agent, then wait for evaluations to finish processing
result = galtea.evaluations.run(
version_id=version_id,
agent=my_agent,
)
# run() with agent returns evaluations in PENDING status — wait for them to complete
evaluation_ids = [e.id for e in result["evaluations"]]
completed = galtea.evaluations.wait_for(evaluation_ids=evaluation_ids)
for evaluation in completed:
print(f"Metric {evaluation.metric_id}: {evaluation.status} — {evaluation.score}")
Wait for an endpoint-connection job to complete, then automatically discover and collect all evaluations it produced. Use this when calling run() without an agent, since evaluation IDs are not available until the job finishes.The method handles the full lifecycle:
- Polls the job status until it completes
- Discovers all sessions created by the job
- Waits for evaluations to leave
PENDING status
- Paginates through all results
No specific result ordering is guaranteed. # Endpoint-connection mode: run() returns a jobId instead of evaluations
result = galtea.evaluations.run(version_id=version_id)
job_id = result["jobId"]
# Wait for the job to complete and all evaluations to finish
completed = galtea.evaluations.wait_for(job_id=job_id, timeout=600)
for evaluation in completed:
print(f"Metric {evaluation.metric_id}: {evaluation.status} — {evaluation.score}")
Returns
A list of Evaluation objects once all have left PENDING status. When using evaluation_ids, results are in the same order as the input. When using job_id, no specific result ordering is guaranteed.
Parameters
A list of evaluation IDs to wait for. Mutually exclusive with job_id.
The job ID returned by run() in endpoint-connection mode. Mutually exclusive with evaluation_ids.
Maximum seconds to wait before raising TimeoutError. When using job_id, this covers both the job polling and evaluation polling phases.
Seconds to sleep between polling cycles.
Errors
| Error | Cause |
|---|
ValueError | Neither evaluation_ids nor job_id provided, or both provided |
RuntimeError | The job failed (only when using job_id) |
TimeoutError | Timeout exceeded before all evaluations completed |