Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt

Use this file to discover all available pages before exploring further.

After creating evaluations (via create() or run()), they start in PENDING status while the evaluation engine processes them. Use wait_for() to block until all evaluations have completed. An evaluation is considered complete when its status is anything other than PENDING: SUCCESS, FAILED, SKIPPED, or PENDING_HUMAN.

Usage

Wait for specific evaluations you already have IDs for — typically from create() or from a run() with an agent.Returns evaluations in the same order as the input evaluation_ids.Create and wait:
# Create evaluations and wait for them to complete
evaluations = galtea.evaluations.create(
    session_id=session.id,
    metrics=[{"name": "Non-Toxic"}, {"name": "Unbiased"}],
)

# Wait for all evaluations to leave PENDING status
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
)

for evaluation in completed:
    print(f"{evaluation.id}: {evaluation.status} — score: {evaluation.score}")
Custom timeout and poll interval:
# Wait with a custom timeout and poll interval
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
    timeout=600,       # wait up to 10 minutes
    poll_interval=10,  # check every 10 seconds
)
Full lifecycle — run() with agent, then wait_for():
# Full lifecycle: run with agent, then wait for evaluations to finish processing
result = galtea.evaluations.run(
    version_id=version_id,
    agent=my_agent,
)

# run() with agent returns evaluations in PENDING status — wait for them to complete
evaluation_ids = [e.id for e in result["evaluations"]]
completed = galtea.evaluations.wait_for(evaluation_ids=evaluation_ids)

for evaluation in completed:
    print(f"Metric {evaluation.metric_id}: {evaluation.status}{evaluation.score}")

Returns

A list of Evaluation objects once all have left PENDING status. When using evaluation_ids, results are in the same order as the input. When using job_id, no specific result ordering is guaranteed.

Parameters

evaluation_ids
list[str]
A list of evaluation IDs to wait for. Mutually exclusive with job_id.
job_id
str
The job ID returned by run() in endpoint-connection mode. Mutually exclusive with evaluation_ids.
timeout
int
default:"300"
Maximum seconds to wait before raising TimeoutError. When using job_id, this covers both the job polling and evaluation polling phases.
poll_interval
int
default:"5"
Seconds to sleep between polling cycles.

Errors

ErrorCause
ValueErrorNeither evaluation_ids nor job_id provided, or both provided
RuntimeErrorThe job failed (only when using job_id)
TimeoutErrorTimeout exceeded before all evaluations completed