Create evaluations for a single inference result

curl --request POST \ --url https://api.galtea.ai/evaluations/fromInferenceResult \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "inferenceResultId": "ir_123", "metrics": [ { "id": "id_123", "name": "Example Name", "score": 0.95 } ], "specificationIds": [ "<string>" ] } '

[ { "id": "eval_123", "metricId": "metric_123", "sessionId": "session_123", "userId": "user_123", "status": "SUCCESS", "testCaseId": "tc_123", "inferenceResultId": "ir_123", "score": 0.95, "reason": "High quality response", "error": "<string>", "canRetry": false, "creditsUsed": 1, "conversationSimulatorVersion": "1.0.0", "humanEvaluatorId": "<string>", "humanEvaluatorStartedAt": "2023-11-07T05:31:56Z", "humanScore": 123, "humanReason": "<string>", "humanEvaluatorFinishedAt": "2023-11-07T05:31:56Z", "failedTurns": [ "<string>" ], "createdAt": "2023-11-07T05:31:56Z", "deletedAt": "2023-11-07T05:31:56Z", "evaluatedAt": "2023-11-07T05:31:56Z", "metricLegacyAt": "2023-11-07T05:31:56Z", "metricDisabledAt": "2023-11-07T05:31:56Z" } ]

Authorizations

Authorization

string

header

required

API key authorization. Pass your API key in the Authorization header as a Bearer token. Both new (gsk_*) and legacy (gsk-) API keys are accepted, e.g. Authorization: Bearer gsk_... or Authorization: Bearer gsk-....

Body

application/json

inferenceResultId

string

required

The ID of the inference result (turn) to evaluate

Example:

"ir_123"

metrics

object[]

Metrics to evaluate. Optional if specificationIds is provided or the product has specifications with linked metrics.

Show child attributes

specificationIds

string[]

Specification IDs whose linked metrics will be evaluated. Can be combined with metrics; the API merges and deduplicates.

Response

Evaluations created successfully

string

Example:

"eval_123"

metricId

string

Example:

"metric_123"

sessionId

string

Example:

"session_123"

userId

string | null

Example:

"user_123"

status

enum<string>

Available options:

PENDING,

PENDING_HUMAN,

SUCCESS,

FAILED,

SKIPPED

Example:

"SUCCESS"

testCaseId

string | null

Example:

"tc_123"

inferenceResultId

string | null

Example:

"ir_123"

score

number | null

Example:

0.95

reason

string | null

Example:

"High quality response"

error

string | null

canRetry

boolean | null

Example:

false

creditsUsed

integer | null

Example:

1

conversationSimulatorVersion

string | null

Example:

"1.0.0"

humanEvaluatorId

string | null

User ID of the human evaluator

humanEvaluatorStartedAt

string<date-time> | null

humanScore

number | null

Human-provided annotation score

humanReason

string | null

Human-provided annotation reason

humanEvaluatorFinishedAt

string<date-time> | null

Timestamp when human evaluation was submitted

failedTurns

string[]

Conversation turns that failed

createdAt

string<date-time>

deletedAt

string<date-time> | null

evaluatedAt

string<date-time> | null

metricLegacyAt

string<date-time> | null

metricDisabledAt

string<date-time> | null

Documentation Index

Authorizations

Body

Response