Create Evaluation Task from Production
Create an evaluation task from real user interactions in production.
Galtea SDK also supports creating evaluation tasks directly from your production environment using the create_from_production
method.
This is useful for ongoing monitoring and analysis of real user interactions even using past interactions.
Returns
Returns a list of EvaluationTask objects for the given version and user input.
Example
See an example of running evaluation tasks of your product in production in our Monitor Production Responses to User Queries example.
Parameters
The ID of the version of the product you want to evaluate.
The metrics to use for the evaluation.
The system will create a task for each metric provided.
The real user query that your product handled in production.
The actual output produced by the product.
Additional data or broader conversational context that was used and was relevant when the actual_output
was generated in production.
The context retrieved by your RAG system that was used to generate the actual output.
A list of previous conversation turns, each a dictionary with “input” and “actual_output” keys. This is used for evaluating conversational AI.
Example: [{"input": "Hello", "actual_output": "Hi there!"}, {"input": "How are you?", "actual_output": "I'm doing well, thanks!"}]
Time lapsed (in ms) from the moment the request was sent to the LLM to the moment the response was received.
Token usage information for the LLM call. Keys must be snake_case. Possible keys: input_tokens
, output_tokens
, cache_read_input_tokens
.
Cost information for the LLM call. Keys must be snake_case. Possible keys: cost_per_input_token
, cost_per_output_token
, cost_per_cache_read_input_token
.