Skip to main content
POST
/
evaluations
/
fromInferenceResult
Create evaluations for a single inference result
curl --request POST \
  --url https://api.galtea.ai/evaluations/fromInferenceResult \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "inferenceResultId": "ir_123",
  "metrics": [
    {
      "id": "id_123",
      "name": "Example Name",
      "score": 0.95
    }
  ],
  "specificationIds": [
    "<string>"
  ]
}
'
[
  {
    "id": "eval_123",
    "metricId": "metric_123",
    "sessionId": "session_123",
    "userId": "user_123",
    "status": "SUCCESS",
    "testCaseId": "tc_123",
    "inferenceResultId": "ir_123",
    "score": 0.95,
    "reason": "High quality response",
    "error": "<string>",
    "canRetry": false,
    "creditsUsed": 1,
    "conversationSimulatorVersion": "1.0.0",
    "humanEvaluatorId": "<string>",
    "humanEvaluatorStartedAt": "2023-11-07T05:31:56Z",
    "failedTurns": [
      "<string>"
    ],
    "createdAt": "2023-11-07T05:31:56Z",
    "deletedAt": "2023-11-07T05:31:56Z",
    "evaluatedAt": "2023-11-07T05:31:56Z",
    "metricLegacyAt": "2023-11-07T05:31:56Z",
    "metricDisabledAt": "2023-11-07T05:31:56Z"
  }
]

Authorizations

Authorization
string
header
required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Body

application/json
inferenceResultId
string
required

The ID of the inference result (turn) to evaluate

Example:

"ir_123"

metrics
object[]

Metrics to evaluate. Optional if specificationIds is provided or the product has specifications with linked metrics.

specificationIds
string[]

Specification IDs whose linked metrics will be evaluated. Can be combined with metrics; the API merges and deduplicates.

Response

Evaluations created successfully

id
string
Example:

"eval_123"

metricId
string
Example:

"metric_123"

sessionId
string
Example:

"session_123"

userId
string | null
Example:

"user_123"

status
enum<string>
Available options:
PENDING,
PENDING_HUMAN,
SUCCESS,
FAILED,
SKIPPED
Example:

"SUCCESS"

testCaseId
string | null
Example:

"tc_123"

inferenceResultId
string | null
Example:

"ir_123"

score
number | null
Example:

0.95

reason
string | null
Example:

"High quality response"

error
string | null
canRetry
boolean | null
Example:

false

creditsUsed
integer | null
Example:

1

conversationSimulatorVersion
string | null
Example:

"1.0.0"

humanEvaluatorId
string | null

User ID of the human evaluator

humanEvaluatorStartedAt
string<date-time> | null
failedTurns
string[]

Conversation turns that failed

createdAt
string<date-time>
deletedAt
string<date-time> | null
evaluatedAt
string<date-time> | null
metricLegacyAt
string<date-time> | null
metricDisabledAt
string<date-time> | null