Skip to main content
Galtea’s Conversation Simulator allows you to test your conversational AI products by simulating realistic user interactions. This guide walks you through integrating your agent and running simulations.

Agent Integration Options

The quickest way to get started. Your function receives just the latest user message as a string.
def my_agent(user_message: str) -> str:
    # In a real scenario, call your model here
    return f"Your model output to: {user_message}"
All three signatures work with generate() and simulate(). Both sync and async functions are supported. The SDK auto-detects which signature you’re using from the type hint on the first parameter.

Conversation Simulation Workflow

1

1. Implement Your Agent

Define an agent function with one of the supported signatures above.
2

2. Prepare Scenario Data

Create a CSV file with scenario data. Each row is a test case describing the user goal, persona, and input (the first user message).
3

3. Create a Test and Sessions

Upload your scenario CSV to create a test. The platform generates a session for each scenario.
4

4. Run the Simulator with Your Agent

Use SimulatorService.simulate() to execute the conversation between your agent and the synthetic user, for each session.
5

5. Evaluate the Results

After simulation, analyze results and optionally trigger evaluations via evaluations.create().

Step-by-Step Guide

1. Create a Test and Sessions

First, create behavior test cases with user personas and goals. You can generate these from your product description or upload a CSV:
# Create a test suite using the behavior test options
# This can be done via the Dashboard or programmatically as shown here
test = galtea_client.tests.create(
    product_id=product.id,
    name=test_name,
    type="BEHAVIOR",
    # This time we provide a path to a CSV file with behavior tests, but you can also have Galtea generate them if you do not provide a CSV file
    test_file_path="path/to/behavior_test.csv",
)

# Get your test cases
# If Galtea is generating the test for you, it might take a few moments to be ready
test_cases = galtea_client.test_cases.list(test_id=test.id)
Once generation completes, you’ll see the resulting test cases in your dashboard. For the CSV upload format, see Behavior Tests.

2. Run the Conversation Simulator

For each test case/session, use the simulator to run the full simulation with your agent function:
# Define your agent function (see Agent Integration Options for all signatures)
def my_agent(user_message: str) -> str:
    return f"Response to: {user_message}"


# Run simulations with your agent function
for test_case in test_cases:
    session = galtea_client.sessions.create(
        version_id=version.id, test_case_id=test_case.id
    )

    result = galtea_client.simulator.simulate(
        session_id=session.id, agent=my_agent, max_turns=10
    )

    # Analyze results
    print(f"Scenario: {test_case.scenario}")
    print(f"Completed {result.total_turns} turns")
    print(f"Success: {result.finished}")
    if result.stopping_reason:
        print(f"Ended because: {result.stopping_reason}")
You can optionally use the @trace decorator to capture internal operations during simulation. Traces are automatically collected and saved per turn. See the Tracing Agent Operations guide for more details on using the @trace decorator.

3. Evaluate the Session

    # After each simulation, you can create an evaluation
    evaluations = galtea_client.evaluations.create(
        session_id=session.id,
        metrics=[{"name": "Role Adherence"}],  # Replace with your metrics
    )
    for evaluation in evaluations:
        print(f"Evaluation created: {evaluation.id}")

Advanced Usage: RAG Agents with Retrieval Context

For Retrieval-Augmented Generation (RAG) agents, you can return the context that was retrieved and used to generate the response. This context will be logged with the inference result, enabling powerful evaluations with metrics like Faithfulness and Contextual Relevancy.
def my_rag_agent(input_data: galtea.AgentInput) -> galtea.AgentResponse:
    user_message = input_data.last_user_message_str()

    # Your RAG logic to retrieve context and generate a response
    retrieved_docs = vector_store.search(user_message)
    response_content = llm.generate(prompt=user_message, context=retrieved_docs)

    return galtea.AgentResponse(
        content=response_content,
        retrieval_context=retrieved_docs,
        metadata={"docs_retrieved": len(retrieved_docs)},
    )
The retrieval_context field is optional and can contain:
  • Retrieved document snippets or full documents
  • Formatted context strings
  • JSON-serializable data structures
By providing retrieval context, you enable Galtea to evaluate the faithfulness of your model’s responses relative to the retrieved information, which is crucial for assessing RAG system quality.
With agent functions and the simulator, you can evaluate conversational AI in realistic, repeatable conditions and track improvements over time.