Create an evaluation task for a single-turn interaction, either from a test case or production data.
test_case_id
, Galtea evaluates your product’s performance against a predefined challenge.is_production=True
and provide an input
, Galtea logs and evaluates real user interactions.CustomScoreEvaluationMetric
.test_case_id
is provided.True
to indicate the task is from a production environment. Defaults to False
.actual_output
.{"input_tokens": 10, "output_tokens": 5}
).