You can use Galtea to log and evaluate real user interactions from your production environment. This helps you monitor your product’s performance over time.

Single-Turn Production Monitoring

For simple, single-turn interactions, you can send the details to Galtea using create_single_turn with is_production=True.

from galtea import Galtea
import os

galtea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))
VERSION_ID = "YOUR_ACTIVE_VERSION_ID"

# In your application's request handler...
def handle_user_query(user_query, retrieval_context=None):
    # Your logic to get a response from your model
    model_response = your_product_function(user_query, retrieval_context)
    
    # Log and evaluate the interaction in Galtea
    galtea.evaluation_tasks.create_single_turn(
      version_id=VERSION_ID,
      is_production=True,
      metrics=["coherence", "answer-relevancy", "faithfulness"],
      input=user_query,
      actual_output=model_response,
      retrieval_context=retrieval_context
    )
    
    return model_response

Multi-Turn Production Monitoring (Conversations)

For multi-turn conversations, use the session-based workflow to log the entire interaction first and then evaluate it.

1

1. Create a Session

First, create a session at the start of the conversation. For production monitoring, make sure to set is_production=True.

from galtea import Galtea
import os

galtea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))
VERSION_ID = "YOUR_ACTIVE_VERSION_ID"
METRICS_TO_EVALUATE = ["conversation-relevancy", "knowledge-retention"]

# Use is_production=True for real user interactions
session = galtea.sessions.create(
    custom_id="CLIENT_PROVIDED_SESSION_ID",  # Optional: a custom ID to associate this session in Galtea Platform to the one in your real application.
    version_id=VERSION_ID,
    is_production=True
)
2

2. Log Conversation Turns

Next, log the user-assistant interactions. You can do this individually as each turn happens or in a single batch after the conversation ends.

This approach is useful for logging interactions in real-time in a live application.

def get_model_response(user_input):
    # Replace this with your actual model call
    model_output = f"This is a simulated response to '{user_input}'"
    return model_output

# This would happen dynamically in your application.
user_questions = [
    "What's the weather like in Paris?",
    "Is it expected to rain later?",
    "Great, thanks!"
]

for question in user_questions:
    model_response = get_model_response(question)
    # Log the turn to Galtea right after it happens
    inference_result = galtea.inference_results.create(
        session_id=session.id,
        input=question,
        output=model_response
    )
3

3. Evaluate the Session

Finally, once the conversation is complete and all turns are logged, you can run an evaluation on the entire session.

galtea.evaluation_tasks.create(
    session_id=session.id,
    metrics=METRICS_TO_EVALUATE
)

print(f"Logged and evaluated production session {session.id}")

For more details on evaluating multi-turn conversations, see the Evaluating Conversations guide.