Learn how to log and evaluate user queries from your production environment.
You can use Galtea to log and evaluate real user interactions from your production environment. This helps you monitor your product’s performance over time.
For simple, single-turn interactions, you can send the details to Galtea using create_single_turn with is_production=True.
Copy
from galtea import Galteaimport osgaltea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))VERSION_ID = "YOUR_ACTIVE_VERSION_ID"# In your application's request handler...def handle_user_query(user_query, retrieval_context=None): # Your logic to get a response from your model model_response = your_product_function(user_query, retrieval_context) # Log and evaluate the interaction in Galtea galtea.evaluation_tasks.create_single_turn( version_id=VERSION_ID, is_production=True, metrics=["coherence", "answer-relevancy", "faithfulness"], input=user_query, actual_output=model_response, retrieval_context=retrieval_context ) return model_response
For multi-turn conversations, use the session-based workflow to log the entire interaction first and then evaluate it.
1
1. Create a Session
First, create a session at the start of the conversation. For production monitoring, make sure to set is_production=True.
Copy
from galtea import Galteaimport osgaltea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))VERSION_ID = "YOUR_ACTIVE_VERSION_ID"METRICS_TO_EVALUATE = ["conversation-relevancy", "knowledge-retention"]# Use is_production=True for real user interactionssession = galtea.sessions.create( custom_id="CLIENT_PROVIDED_SESSION_ID", # Optional: a custom ID to associate this session in Galtea Platform to the one in your real application. version_id=VERSION_ID, is_production=True)
2
2. Log Conversation Turns
Next, log the user-assistant interactions. You can do this individually as each turn happens or in a single batch after the conversation ends.
This approach is useful for logging interactions in real-time in a live application.
Copy
def get_model_response(user_input): # Replace this with your actual model call model_output = f"This is a simulated response to '{user_input}'" return model_output# This would happen dynamically in your application.user_questions = [ "What's the weather like in Paris?", "Is it expected to rain later?", "Great, thanks!"]for question in user_questions: model_response = get_model_response(question) # Log the turn to Galtea right after it happens inference_result = galtea.inference_results.create( session_id=session.id, input=question, output=model_response )
This approach is useful for logging interactions in real-time in a live application.
Copy
def get_model_response(user_input): # Replace this with your actual model call model_output = f"This is a simulated response to '{user_input}'" return model_output# This would happen dynamically in your application.user_questions = [ "What's the weather like in Paris?", "Is it expected to rain later?", "Great, thanks!"]for question in user_questions: model_response = get_model_response(question) # Log the turn to Galtea right after it happens inference_result = galtea.inference_results.create( session_id=session.id, input=question, output=model_response )
If you have the entire conversation history, you can log all turns at once for efficiency.
Copy
# The conversation must be in the standard format: a list of role/content dictionariesconversation_turns = [ {'role': 'user', 'content': "What's the weather like in Paris?"}, {'role': 'assistant', 'content': "It's sunny and 22°C in Paris."}, {'role': 'user', 'content': "Is it expected to rain later?"}, {'role': 'assistant', 'content': "No, the forecast shows clear skies for the rest of the day."}, {'role': 'user', 'content': "Great, thanks!"}, {'role': 'assistant', 'content': "You're welcome!"}]galtea.inference_results.create_batch( session_id=session.id, conversation_turns=conversation_turns)
3
3. Evaluate the Session
Finally, once the conversation is complete and all turns are logged, you can run an evaluation on the entire session.
Copy
galtea.evaluation_tasks.create( session_id=session.id, metrics=METRICS_TO_EVALUATE)print(f"Logged and evaluated production session {session.id}")
For more details on evaluating multi-turn conversations, see the Evaluating Conversations guide.