Waiting for Evaluations

After creating evaluations (via create() or run()), they start in PENDING status while the evaluation engine processes them. Use wait_for() to block until all evaluations have completed. An evaluation is considered complete when its status is anything other than PENDING: SUCCESS, FAILED, SKIPPED, or PENDING_HUMAN.

Usage

By Evaluation IDs
By Job ID

Wait for specific evaluations you already have IDs for — typically from create() or from a run() with an agent.Returns evaluations in the same order as the input evaluation_ids.Create and wait:

# Create evaluations and wait for them to complete
evaluations = galtea.evaluations.create(
    session_id=session.id,
    metrics=[{"name": "Non-Toxic"}, {"name": "Unbiased"}],
)

# Wait for all evaluations to leave PENDING status
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
)

for evaluation in completed:
    print(f"{evaluation.id}: {evaluation.status} — score: {evaluation.score}")

Custom timeout and poll interval:

# Wait with a custom timeout and poll interval
completed = galtea.evaluations.wait_for(
    evaluation_ids=[e.id for e in evaluations],
    timeout=600,       # wait up to 10 minutes
    poll_interval=10,  # check every 10 seconds
)

Full lifecycle — run() with agent, then wait_for():

# Full lifecycle: run with agent, then wait for evaluations to finish processing
result = galtea.evaluations.run(
    version_id=version_id,
    agent=my_agent,
)

# run() with agent returns evaluations in PENDING status — wait for them to complete
evaluation_ids = [e.id for e in result["evaluations"]]
completed = galtea.evaluations.wait_for(evaluation_ids=evaluation_ids)

for evaluation in completed:
    print(f"Metric {evaluation.metric_id}: {evaluation.status} — {evaluation.score}")

Wait for an endpoint-connection job to complete, then automatically discover and collect all evaluations it produced. Use this when calling run() without an agent, since evaluation IDs are not available until the job finishes.The method handles the full lifecycle:

Polls the job status until it completes
Discovers all sessions created by the job
Waits for evaluations to leave PENDING status
Paginates through all results

No specific result ordering is guaranteed.

    # Endpoint-connection mode: run() returns a jobId instead of evaluations
    result = galtea.evaluations.run(version_id=version_id)
    job_id = result["jobId"]

    # Wait for the job to complete and all evaluations to finish
    completed = galtea.evaluations.wait_for(job_id=job_id, timeout=600)

    for evaluation in completed:
        print(f"Metric {evaluation.metric_id}: {evaluation.status} — {evaluation.score}")

Returns

A list of Evaluation objects once all have left PENDING status. When using evaluation_ids, results are in the same order as the input. When using job_id, no specific result ordering is guaranteed.

Parameters

evaluation_ids

list[str]

A list of evaluation IDs to wait for. Mutually exclusive with job_id.

job_id

str

The job ID returned by run() in endpoint-connection mode. Mutually exclusive with evaluation_ids.

timeout

int

default:"300"

Maximum seconds to wait before raising TimeoutError. When using job_id, this covers both the job polling and evaluation polling phases.

poll_interval

int

default:"5"

Seconds to sleep between polling cycles.

Errors

Error	Cause
`ValueError`	Neither `evaluation_ids` nor `job_id` provided, or both provided
`RuntimeError`	The job failed (only when using `job_id`)
`TimeoutError`	Timeout exceeded before all evaluations completed

Introduction

SDK

CLI

Concepts

Waiting for Evaluations

Usage

Returns

Parameters

Errors

Introduction

SDK

CLI

Concepts

Documentation Index

​Usage

​Returns

​Parameters

​Errors

Usage

Returns

Parameters

Errors