Returns
Returns a tuple containing:- An InferenceResult object
- A list of Evaluation objects, one for each metric provided
Usage
This method combines creating an inference result with its evaluation in a single convenient call. It’s the recommended approach for single-turn evaluations, replacing the deprecatedgaltea.evaluations.create_single_turn() method.
Basic Example
With Pre-computed Scores
With Custom Score Calculation
Parameters
The session ID to log the inference result to.
The generated output/response from the AI model.
A list of metrics to evaluate against. Supports multiple formats:
- Strings: Metric names (e.g.,
["accuracy", "relevance"]) - CustomScoreEvaluationMetric: Objects with dynamic score calculation. Must be initialized with either ‘name’ or ‘id’ parameter.
- MetricInput dicts: Format with optional id, name, and score.
- If
scoreis afloat: Pre-calculated score (requires ‘id’ or ‘name’ in the dict). - If
scoreis aCustomScoreEvaluationMetric: Dynamic score calculation.
- If
The input text/prompt. If not provided, will be inferred from the test case linked to the session.
Context retrieved by a RAG system, if applicable.
Latency in milliseconds from model invocation to response.
Token usage information from the model call.
Supported keys:
input_tokens, output_tokens, cache_read_input_tokens.Cost breakdown for the model call.
Supported keys:
cost_per_input_token, cost_per_output_token, cost_per_cache_read_input_token.Version of Galtea’s conversation simulator used to generate the input.
See Also
- Create Evaluation - Evaluate an existing session or inference result
- Create Inference Result - Create an inference result without immediate evaluation