Skip to main content
PATCH
/
inferenceResults
/
{id}
Update inference result
curl --request PATCH \
  --url https://api.galtea.ai/inferenceResults/{id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "actualOutput": "Model response text",
  "retrievalContext": "Retrieved context document",
  "latency": 123
}
'
{
  "id": "ir_123",
  "sessionId": "session_123",
  "userId": "user_123",
  "index": 0,
  "status": "PENDING",
  "input": {
    "user_message": "User input text"
  },
  "actualOutput": "Model response",
  "retrievalContext": "<string>",
  "latency": 150,
  "inputTokens": 100,
  "outputTokens": 50,
  "cacheReadInputTokens": 20,
  "tokens": 150,
  "costPerInputToken": 0.00001,
  "costPerOutputToken": 0.00003,
  "costPerCacheReadInputToken": 0.000005,
  "cost": 0.001,
  "creditsUsed": 1,
  "conversationSimulatorVersion": "1.0.0",
  "traceId": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
  "createdAt": "2023-11-07T05:31:56Z",
  "deletedAt": "2023-11-07T05:31:56Z"
}

Authorizations

Authorization
string
header
required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Path Parameters

id
string
required

Inference result ID

Body

application/json
actualOutput
string
Example:

"Model response text"

retrievalContext
string
Example:

"Retrieved context document"

latency
integer

Response

Inference result updated successfully

id
string
Example:

"ir_123"

sessionId
string
Example:

"session_123"

userId
string | null
Example:

"user_123"

index
integer

Order index within the session

Example:

0

status
enum<string>
Available options:
PENDING,
GENERATED,
FAILED
Example:

"PENDING"

input
object

Structured input data. For plain text input, format is { user_message: "..." }

Example:
{ "user_message": "User input text" }
actualOutput
string | null
Example:

"Model response"

retrievalContext
string | null

The RAG retrieval context (retrieved documents/snippets) used to generate the actual output. Used by RAG-aware evaluation metrics.

latency
integer | null
Example:

150

inputTokens
integer | null
Example:

100

outputTokens
integer | null
Example:

50

cacheReadInputTokens
integer | null
Example:

20

tokens
integer | null

Total tokens

Example:

150

costPerInputToken
number | null
Example:

0.00001

costPerOutputToken
number | null
Example:

0.00003

costPerCacheReadInputToken
number | null
Example:

0.000005

cost
number | null
Example:

0.001

creditsUsed
integer | null
Example:

1

conversationSimulatorVersion
string | null
Example:

"1.0.0"

traceId
string | null

W3C trace ID for the root span created during direct inference. The same trace ID is propagated to the user endpoint via the traceparent header.

Example:

"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

createdAt
string<date-time>
deletedAt
string<date-time> | null