Skip to main content
After creating evaluations (via create() or run()), they start in PENDING status while the evaluation engine processes them. Use wait_for() to block until all evaluations have completed. An evaluation is considered complete when its status is anything other than PENDING: SUCCESS, FAILED, SKIPPED, or PENDING_HUMAN.

Usage

Wait for specific evaluations you already have IDs for — typically from create() or from a run() with an agent.Returns evaluations in the same order as the input evaluation_ids.Create and wait:
# Create evaluations and wait for them to complete
evaluations = galtea.evaluations.create(
    session_id=session.id,
    metrics=[{"name": "Non-Toxic"}, {"name": "Unbiased"}],
)

# Wait for all evaluations to leave PENDING status
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
)

for evaluation in completed:
    print(f"{evaluation.id}: {evaluation.status} — score: {evaluation.score}")
Custom timeout and poll interval:
# Wait with a custom timeout and poll interval
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
    timeout=600,       # wait up to 10 minutes
    poll_interval=10,  # check every 10 seconds
)
Full lifecycle — run() with agent, then wait_for():
# Full lifecycle: run with agent, then wait for evaluations to finish processing
result = galtea.evaluations.run(
    version_id=version_id,
    agent=my_agent,
)

# run() with agent returns evaluations in PENDING status — wait for them to complete
evaluation_ids = [e.id for e in result["evaluations"]]
completed = galtea.evaluations.wait_for(evaluation_ids=evaluation_ids)

for evaluation in completed:
    print(f"Metric {evaluation.metric_id}: {evaluation.status}{evaluation.score}")

Returns

A list of Evaluation objects once all have left PENDING status. When using evaluation_ids, results are in the same order as the input. When using job_id, no specific result ordering is guaranteed.

Parameters

evaluation_ids
list[str]
A list of evaluation IDs to wait for. Mutually exclusive with job_id.
job_id
str
The job ID returned by run() in endpoint-connection mode. Mutually exclusive with evaluation_ids.
timeout
int
default:"300"
Maximum seconds to wait before raising TimeoutError. When using job_id, this covers both the job polling and evaluation polling phases.
poll_interval
int
default:"5"
Seconds to sleep between polling cycles.

Errors

ErrorCause
ValueErrorNeither evaluation_ids nor job_id provided, or both provided
RuntimeErrorThe job failed (only when using job_id)
TimeoutErrorTimeout exceeded before all evaluations completed