Create inference result

curl --request POST \ --url https://api.galtea.ai/inferenceResults \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "sessionId": "session_123", "userId": "user_123", "index": 0, "status": "PENDING", "input": {}, "actualOutput": "Model response text", "retrievalContext": "Retrieved context document", "latency": 123, "inputTokens": 123, "outputTokens": 123, "cacheReadInputTokens": 123, "tokens": 123, "costPerInputToken": 123, "costPerOutputToken": 123, "costPerCacheReadInputToken": 123, "cost": 123, "conversationSimulatorVersion": "<string>" } '

{ "id": "ir_123", "sessionId": "session_123", "userId": "user_123", "index": 0, "status": "PENDING", "input": { "user_message": "User input text" }, "actualOutput": "Model response", "retrievalContext": "<string>", "latency": 150, "inputTokens": 100, "outputTokens": 50, "cacheReadInputTokens": 20, "tokens": 150, "costPerInputToken": 0.00001, "costPerOutputToken": 0.00003, "costPerCacheReadInputToken": 0.000005, "cost": 0.001, "creditsUsed": 1, "conversationSimulatorVersion": "1.0.0", "traceId": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4", "createdAt": "2023-11-07T05:31:56Z", "deletedAt": "2023-11-07T05:31:56Z" }

Authorizations

Authorization

string

header

required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Body

application/json

sessionId

string

ID of the session

Example:

"session_123"

userId

string | null

ID of the user

Example:

"user_123"

index

integer

Index of the inference result

Example:

0

status

enum<string>

Status of the inference result

Available options:

PENDING,

GENERATED,

FAILED

input

object

Structured input data. For plain text, use { user_message: "..." }

actualOutput

string | null

Actual output text

Example:

"Model response text"

retrievalContext

string | null

Retrieval context information

Example:

"Retrieved context document"

latency

number | null

Latency in milliseconds

inputTokens

integer | null

Number of input tokens

outputTokens

integer | null

Number of output tokens

cacheReadInputTokens

integer | null

Number of cache read input tokens

tokens

integer | null

Total number of tokens

costPerInputToken

number | null

Cost per input token

costPerOutputToken

number | null

Cost per output token

costPerCacheReadInputToken

number | null

Cost per cache read input token

cost

number | null

Total cost

conversationSimulatorVersion

string | null

Version of the conversation simulator

Response

Inference result created successfully

string

Example:

"ir_123"

sessionId

string

Example:

"session_123"

userId

string | null

Example:

"user_123"

index

integer

Order index within the session

Example:

0

status

enum<string>

Available options:

PENDING,

GENERATED,

FAILED

Example:

"PENDING"

input

object

Structured input data. For plain text input, format is { user_message: "..." }

Example:

{ "user_message": "User input text" }

actualOutput

string | null

Example:

"Model response"

retrievalContext

string | null

The RAG retrieval context (retrieved documents/snippets) used to generate the actual output. Used by RAG-aware evaluation metrics.

latency

integer | null

Example:

150

inputTokens

integer | null

Example:

100

outputTokens

integer | null

Example:

50

cacheReadInputTokens

integer | null

Example:

20

tokens

integer | null

Total tokens

Example:

150

costPerInputToken

number | null

Example:

0.00001

costPerOutputToken

number | null

Example:

0.00003

costPerCacheReadInputToken

number | null

Example:

0.000005

cost

number | null

Example:

0.001

creditsUsed

integer | null

Example:

1

conversationSimulatorVersion

string | null

Example:

"1.0.0"

traceId

string | null

W3C trace ID for the root span created during direct inference. The same trace ID is propagated to the user endpoint via the traceparent header.

Example:

"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

createdAt

string<date-time>

deletedAt

string<date-time> | null

Health

Organizations

UserGroups

Metrics

Specifications

Models

Products

Versions

EndpointConnections

Tests

TestCases

Sessions

InferenceResults

Traces

Evaluations

Human Evaluations

Generate From Few Shot

Analytics

Storage

Permissions

EvaluatorModels

ConversationSimulator

SupportedVersion

OTel

Create inference result

Authorizations

Body

Response