Skip to main content
POST
/
inferenceResults
/
batch
Create inference results batch
curl --request POST \
  --url https://api.galtea.ai/inferenceResults/batch \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "sessionId": "session_123",
  "conversationTurns": [
    {}
  ]
}
'
[
  {
    "id": "ir_123",
    "sessionId": "session_123",
    "userId": "user_123",
    "index": 0,
    "status": "PENDING",
    "input": {
      "user_message": "User input text"
    },
    "actualOutput": "Model response",
    "retrievalContext": "<string>",
    "latency": 150,
    "inputTokens": 100,
    "outputTokens": 50,
    "cacheReadInputTokens": 20,
    "tokens": 150,
    "costPerInputToken": 0.00001,
    "costPerOutputToken": 0.00003,
    "costPerCacheReadInputToken": 0.000005,
    "cost": 0.001,
    "creditsUsed": 1,
    "conversationSimulatorVersion": "1.0.0",
    "traceId": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
    "createdAt": "2023-11-07T05:31:56Z",
    "deletedAt": "2023-11-07T05:31:56Z"
  }
]

Authorizations

Authorization
string
header
required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Body

application/json
sessionId
string
required
Example:

"session_123"

conversationTurns
object[]
required

Response

Inference results created successfully

id
string
Example:

"ir_123"

sessionId
string
Example:

"session_123"

userId
string | null
Example:

"user_123"

index
integer

Order index within the session

Example:

0

status
enum<string>
Available options:
PENDING,
GENERATED,
FAILED
Example:

"PENDING"

input
object

Structured input data. For plain text input, format is { user_message: "..." }

Example:
{ "user_message": "User input text" }
actualOutput
string | null
Example:

"Model response"

retrievalContext
string | null

The RAG retrieval context (retrieved documents/snippets) used to generate the actual output. Used by RAG-aware evaluation metrics.

latency
integer | null
Example:

150

inputTokens
integer | null
Example:

100

outputTokens
integer | null
Example:

50

cacheReadInputTokens
integer | null
Example:

20

tokens
integer | null

Total tokens

Example:

150

costPerInputToken
number | null
Example:

0.00001

costPerOutputToken
number | null
Example:

0.00003

costPerCacheReadInputToken
number | null
Example:

0.000005

cost
number | null
Example:

0.001

creditsUsed
integer | null
Example:

1

conversationSimulatorVersion
string | null
Example:

"1.0.0"

traceId
string | null

W3C trace ID for the root span created during direct inference. The same trace ID is propagated to the user endpoint via the traceparent header.

Example:

"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

createdAt
string<date-time>
deletedAt
string<date-time> | null