Get inference results

[ { "id": "ir_123", "sessionId": "session_123", "userId": "user_123", "index": 0, "status": "PENDING", "input": { "user_message": "User input text" }, "actualOutput": "Model response", "retrievalContext": "<string>", "latency": 150, "inputTokens": 100, "outputTokens": 50, "cacheReadInputTokens": 20, "tokens": 150, "costPerInputToken": 0.00001, "costPerOutputToken": 0.00003, "costPerCacheReadInputToken": 0.000005, "cost": 0.001, "creditsUsed": 1, "conversationSimulatorVersion": "1.0.0", "traceId": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4", "createdAt": "2023-11-07T05:31:56Z", "deletedAt": "2023-11-07T05:31:56Z" } ]

Authorizations

Authorization

string

header

required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Query Parameters

ids

string[]

Filter by inference result IDs

sessionIds

string[]

Filter by session IDs

evaluationIds

string[]

Filter by evaluation IDs

limit

integer

Maximum number of results

offset

integer

Number of results to skip

fromCreatedAt

string<date-time>

Filter inference results created at or after this timestamp (ISO 8601 format)

toCreatedAt

string<date-time>

Filter inference results created at or before this timestamp (ISO 8601 format)

sort

string[]

Sort instructions (field and direction pairs)

Response

Inference results retrieved successfully

string

Example:

"ir_123"

sessionId

string

Example:

"session_123"

userId

string | null

Example:

"user_123"

index

integer

Order index within the session

Example:

0

status

enum<string>

Available options:

PENDING,

GENERATED,

FAILED

Example:

"PENDING"

input

object

Structured input data. For plain text input, format is { user_message: "..." }

Example:

{ "user_message": "User input text" }

actualOutput

string | null

Example:

"Model response"

retrievalContext

string | null

The RAG retrieval context (retrieved documents/snippets) used to generate the actual output. Used by RAG-aware evaluation metrics.

latency

integer | null

Example:

150

inputTokens

integer | null

Example:

100

outputTokens

integer | null

Example:

50

cacheReadInputTokens

integer | null

Example:

20

tokens

integer | null

Total tokens

Example:

150

costPerInputToken

number | null

Example:

0.00001

costPerOutputToken

number | null

Example:

0.00003

costPerCacheReadInputToken

number | null

Example:

0.000005

cost

number | null

Example:

0.001

creditsUsed

integer | null

Example:

1

conversationSimulatorVersion

string | null

Example:

"1.0.0"

traceId

string | null

W3C trace ID for the root span created during direct inference. The same trace ID is propagated to the user endpoint via the traceparent header.

Example:

"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

createdAt

string<date-time>

deletedAt

string<date-time> | null

Health

Organizations

UserGroups

Metrics

Specifications

Models

Products

Versions

EndpointConnections

Tests

TestCases

Sessions

InferenceResults

Traces

Evaluations

Human Evaluations

Generate From Few Shot

Analytics

Storage

Permissions

EvaluatorModels

ConversationSimulator

SupportedVersion

OTel

Get inference results

Authorizations

Query Parameters

Response