Get list of evaluations with pagination and filtering. See Evaluations.
API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....
Filter by evaluation IDs
Filter by product IDs
Filter by session IDs
Filter by inference result IDs (for single-turn evaluations)
Filter by metric IDs
Filter by test case IDs
Filter by test IDs (include only). Use an empty string ("") to select evaluations without a test
Omit evaluations linked to the specified test IDs. Use an empty string ("") to omit evaluations without a test
Filter by version IDs
Filter by specification IDs (returns evaluations whose metric is linked to any of the given specifications)
Sort instructions (field and direction pairs)
Maximum number of results
Number of results to skip
Filter by evaluation statuses (PENDING, SUCCESS, FAILED)
PENDING, PENDING_HUMAN, SUCCESS, FAILED, SKIPPED Filter evaluations that can be retried
Filter by evaluation types
Filter by human evaluator user ID
Filter by metric sources
Filter evaluations created at or after this timestamp (ISO 8601 format)
Filter evaluations created at or before this timestamp (ISO 8601 format)
Evaluations retrieved successfully
"eval_123"
"metric_123"
"session_123"
"user_123"
PENDING, PENDING_HUMAN, SUCCESS, FAILED, SKIPPED "SUCCESS"
"tc_123"
"ir_123"
0.95
"High quality response"
false
1
"1.0.0"
User ID of the human evaluator
Conversation turns that failed