Create Single-Turn Evaluation Task
Create an evaluation task for a single-turn interaction, either from a test case or production data.
Returns
Returns a list of EvaluationTask objects, one for each metric provided.
Usage
This method is versatile and can be used for two main scenarios:
- Test-Based Evaluation: When you provide a
test_case_id
, Galtea evaluates your product’s performance against a predefined challenge. - Production Monitoring: When you set
is_production=True
and provide aninput
, Galtea logs and evaluates real user interactions.
Example: Test-Based Evaluation
Example: Production Monitoring
Parameters
The ID of the version you want to evaluate.
A list of metric type names to use for the evaluation. A separate task will be created for each metric.
The actual output produced by the product.
The ID of the test case to be evaluated. Required for non-production evaluations.
The input text/prompt. Required for production evaluations where no test_case_id
is provided.
Set to True
to indicate the task is from a production environment. Defaults to False
.
A list of pre-computed scores corresponding to the metrics
list. Use None
for metrics that Galtea should evaluate.
The context retrieved by your RAG system that was used to generate the actual_output
.
Time in milliseconds from the request to the LLM until the response was received.
Token usage information (e.g., {"input_tokens": 10, "output_tokens": 5}
).
Cost information for the LLM call.