Create a new inference result. See Inference Results.
API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....
ID of the session
"session_123"
ID of the user
"user_123"
Index of the inference result
0
Status of the inference result
PENDING, GENERATED, FAILED Structured input data. For plain text, use { user_message: "..." }
Actual output text
"Model response text"
Retrieval context information
"Retrieved context document"
Latency in milliseconds
Number of input tokens
Number of output tokens
Number of cache read input tokens
Total number of tokens
Cost per input token
Cost per output token
Cost per cache read input token
Total cost
Version of the conversation simulator
Inference result created successfully
"ir_123"
"session_123"
"user_123"
Order index within the session
0
PENDING, GENERATED, FAILED "PENDING"
Structured input data. For plain text input, format is { user_message: "..." }
{ "user_message": "User input text" }"Model response"
The RAG retrieval context (retrieved documents/snippets) used to generate the actual output. Used by RAG-aware evaluation metrics.
150
100
50
20
Total tokens
150
0.00001
0.00003
0.000005
0.001
1
"1.0.0"
W3C trace ID for the root span created during direct inference. The same trace ID is propagated to the user endpoint via the traceparent header.
"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"