Use this file to discover all available pages before exploring further.
You can use Galtea to log and evaluate real user interactions from your production environment. This helps you monitor your product’s performance over time.
For simple, single-turn interactions, create a production session and use galtea.inference_results.create_and_evaluate() to log and evaluate the interaction in a single call.
# In your application's request handler...def handle_user_query(user_query: str, retrieval_context: str | None = None) -> str: # Your logic to get a response from your model model_response = your_product_function(user_query, retrieval_context) # Log and evaluate the interaction in Galtea session = galtea.sessions.create(version_id=VERSION_ID, is_production=True) galtea.inference_results.create_and_evaluate( session_id=session.id, input=user_query, output=model_response, retrieval_context=retrieval_context, metrics=[ {"name": "Role Adherence"}, {"name": "Answer Relevancy"}, {"name": "Faithfulness"}, ], ) return model_response# Test the handlerhandle_user_query( "What are your business hours?", "Business hours: 9am-5pm Monday-Friday")
For multi-turn conversations, use the session-based workflow to log the entire interaction first and then evaluate it.
1
1. Create a Session
First, create a session at the start of the conversation. For production monitoring, make sure to set is_production=True.
# Use is_production=True for real user interactionssession = galtea.sessions.create( custom_id="CLIENT_PROVIDED_SESSION_ID", # Optional: a custom ID to associate this session in Galtea Platform to the one in your real application. version_id=VERSION_ID, is_production=True,)
2
2. Log Conversation Turns
Next, log the user-assistant interactions. You can do this individually as each turn happens or in a single batch after the conversation ends.
Log Turns Individually
Log Turns in a Batch
This approach is useful for logging interactions in real-time in a live application.
def get_model_response(user_input: str) -> str: # Replace this with your actual model call model_output = f"This is a simulated response to '{user_input}'" return model_output# This would happen dynamically in your application.user_questions = [ "What are some lower-risk investment strategies?", "With age, should the investment strategy change?", "Great, thanks!",]for question in user_questions: model_response = get_model_response(question) # Log the turn to Galtea right after it happens inference_result = galtea.inference_results.create( session_id=session.id, input=question, output=model_response )
If you have the entire conversation history, you can log all turns at once for efficiency.
# The conversation must be in the standard format: a list of role/content dictionariesconversation_turns = [ {"role": "user", "content": "What are some lower-risk investment strategies?"}, { "role": "assistant", "content": "For lower-risk investments, consider diversified index funds, bonds, or Treasury securities.", }, {"role": "user", "content": "With age, should the investment strategy change?"}, { "role": "assistant", "content": "Yes, many advisors recommend shifting to more conservative investments as you approach retirement.", }, {"role": "user", "content": "Great, thanks!"}, {"role": "assistant", "content": "You're welcome!"},]galtea.inference_results.create_batch( session_id=session_batch.id, conversation_turns=conversation_turns)
3
3. Evaluate the Session
Finally, once the conversation is complete and all turns are logged, you can run an evaluation on the entire session.
galtea.evaluations.create(session_id=session.id, metrics=METRICS_TO_EVALUATE)print(f"Logged and evaluated production session {session.id}")
For more details on evaluating multi-turn conversations, see the Evaluating Conversations guide.