Returns
Returns a list of EvaluationTask objects, one for each metric provided.Usage
This method is versatile and can be used for two main scenarios:- Test-Based Evaluation: When you provide a
test_case_id
, Galtea evaluates your product’s performance against a predefined challenge. - Production Monitoring: When you set
is_production=True
and provide aninput
, Galtea logs and evaluates real user interactions.
Example: Test-Based Evaluation with Standard and Custom Metrics
Example: Production Monitoring
Parameters
The ID of the version you want to evaluate.
A list of metrics to use for evaluation. You can provide:
- Standard metrics as strings (e.g., “Role Adherence”).
- Custom, locally-scored metrics as objects inheriting from
CustomScoreEvaluationMetric
.
The actual output produced by the product.
The ID of the test case to be evaluated. Required for non-production evaluations.
The input text/prompt. Required for production evaluations where no
test_case_id
is provided.Set to
True
to indicate the task is from a production environment. Defaults to False
.The context retrieved by your RAG system that was used to generate the
actual_output
.Time in milliseconds from the request to the LLM until the response was received.
Token usage information (e.g.,
{"input_tokens": 10, "output_tokens": 5}
).Cost information for the LLM call.