Galtea’s Conversation Simulator allows you to test your conversational AI products by simulating realistic user interactions. This guide walks you through integrating your agent and running simulations.
Agent Integration Options
Simple
Chat History
Structured
The quickest way to get started. Your function receives just the latest user message as a string.def my_agent(user_message: str) -> str:
# In a real scenario, call your model here
return f"Your model output to: {user_message}"
Use this when your agent needs the full conversation context. Your function receives the message list in the OpenAI format ({"role": "...", "content": "..."}).def my_agent(messages: list[dict]) -> str:
# messages follows the standard chat format:
# [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]
user_message = messages[-1]["content"]
return f"Your model output to: {user_message}"
For full control over input and output — including optional usage tracking, cost tracking, and retrieval context for RAG evaluations.def my_agent(input_data: AgentInput) -> AgentResponse:
user_message = input_data.last_user_message_str()
# In a real scenario, call your model here
model_output = f"Your model output to: {user_message}"
# Return AgentResponse with optional usage/cost tracking
return AgentResponse(
content=model_output,
usage_info={"input_tokens": 100, "output_tokens": 50},
)
All three signatures work with generate() and simulate(). Both sync and async functions are supported. The SDK auto-detects which signature you’re using from the type hint on the first parameter.
Conversation Simulation Workflow
1. Implement Your Agent
Define an agent function with one of the supported signatures above.
2. Prepare Scenario Data
Create a CSV file with scenario data. Each row is a test case describing the user goal, persona, and input (the first user message).
3. Create a Test and Sessions
Upload your scenario CSV to create a test. The platform generates a session for each scenario.
4. Run the Simulator with Your Agent
Use SimulatorService.simulate() to execute the conversation between your agent and the synthetic user, for each session.
5. Evaluate the Results
After simulation, analyze results and optionally trigger evaluations via evaluations.create().
Step-by-Step Guide
1. Create a Test and Sessions
First, create behavior test cases with user personas and goals. You can generate these from your product description or upload a CSV:
# Create a test suite using the behavior test options
# This can be done via the Dashboard or programmatically as shown here
test = galtea_client.tests.create(
product_id=product.id,
name=test_name,
type="BEHAVIOR",
# This time we provide a path to a CSV file with behavior tests, but you can also have Galtea generate them if you do not provide a CSV file
test_file_path="path/to/behavior_test.csv",
)
# Get your test cases
# If Galtea is generating the test for you, it might take a few moments to be ready
test_cases = galtea_client.test_cases.list(test_id=test.id)
Once generation completes, you’ll see the resulting test cases in your dashboard. For the CSV upload format, see Behavior Tests.
2. Run the Conversation Simulator
For each test case/session, use the simulator to run the full simulation with your agent function:
# Define your agent function (see Agent Integration Options for all signatures)
def my_agent(user_message: str) -> str:
return f"Response to: {user_message}"
# Run simulations with your agent function
for test_case in test_cases:
session = galtea_client.sessions.create(
version_id=version.id, test_case_id=test_case.id
)
result = galtea_client.simulator.simulate(
session_id=session.id, agent=my_agent, max_turns=10
)
# Analyze results
print(f"Scenario: {test_case.scenario}")
print(f"Completed {result.total_turns} turns")
print(f"Success: {result.finished}")
if result.stopping_reason:
print(f"Ended because: {result.stopping_reason}")
You can optionally use the @trace decorator to capture internal operations during simulation. Traces are automatically collected and saved per turn.
See the Tracing Agent Operations guide for more details on using the @trace decorator.
3. Evaluate the Session
# After each simulation, you can create an evaluation
evaluations = galtea_client.evaluations.create(
session_id=session.id,
metrics=[{"name": "Role Adherence"}], # Replace with your metrics
)
for evaluation in evaluations:
print(f"Evaluation created: {evaluation.id}")
Advanced Usage: RAG Agents with Retrieval Context
For Retrieval-Augmented Generation (RAG) agents, you can return the context that was retrieved and used to generate the response. This context will be logged with the inference result, enabling powerful evaluations with metrics like Faithfulness and Contextual Relevancy.
def my_rag_agent(input_data: galtea.AgentInput) -> galtea.AgentResponse:
user_message = input_data.last_user_message_str()
# Your RAG logic to retrieve context and generate a response
retrieved_docs = vector_store.search(user_message)
response_content = llm.generate(prompt=user_message, context=retrieved_docs)
return galtea.AgentResponse(
content=response_content,
retrieval_context=retrieved_docs,
metadata={"docs_retrieved": len(retrieved_docs)},
)
The retrieval_context field is optional and can contain:
- Retrieved document snippets or full documents
- Formatted context strings
- JSON-serializable data structures
By providing retrieval context, you enable Galtea to evaluate the faithfulness of your model’s responses relative to the retrieved information, which is crucial for assessing RAG system quality.
With agent functions and the simulator, you can evaluate conversational AI in realistic, repeatable conditions and track improvements over time.