Two Execution Modes
- Endpoint Connection
- Agent (SDK-side)
When no
agent is provided, the evaluation is delegated to the server. The API calls your deployed HTTP endpoint for each test case, generates inference results, and evaluates them. Requires the version to have a conversation_endpoint_connection_id configured.Returns
jobId— the inference batch job ID for tracking progresstestCaseCount— total number of test cases queuedmessage— a human-readable status messagespecifications— summary of each specification evaluated
Examples
Endpoint connection (no agent):Parameters
The ID of the version to evaluate.
The agent to execute locally. Accepts:
- An
Agentclass instance with acall()method - An agent function (sync or async) with one of three signatures:
(str) -> str,(list[dict]) -> str, or(AgentInput) -> AgentResponse
A list of Specification IDs to evaluate. When omitted, all specifications for the product that have linked metrics and tests are used.
Specifications without linked metrics or without linked tests are silently skipped.