Scenario Based Tests
Evaluate multi-turn dialogue interactions using conversation simulation with synthetic users
What are Scenarios?
Scenarios in Galtea are designed to evaluate the conversational capabilities of your product through multi-turn dialogue interactions. These tests use the Conversation Simulator to create realistic conversations with synthetic users that have specific goals, personalities, and behaviors.
Unlike Quality Tests that focus on single-turn question-answer pairs, or Red Teaming Tests that test security vulnerabilities, Scenario based tests evaluate how well your AI product can:
- Maintain context across multiple conversation turns
- Guide users toward successful task completion
- Handle unexpected but realistic user inputs
- Stay in character and follow conversation guidelines
- Manage complex dialogue flows
Creating Scenarios
You can create scenarios in Galtea through two methods:
Create your scenarios file
Prepare a CSV file following the scenarios structure (see format below)
Configure the test
Select “Scenarios” as the test type and “Uploaded” as the test origin
The test creation process can be done via the SDK or the Galtea dashboard
Upload your scenarios file
Select the file you created and upload it to Galtea
Create your scenarios file
Prepare a CSV file following the scenarios structure (see format below)
Configure the test
Select “Scenarios” as the test type and “Uploaded” as the test origin
The test creation process can be done via the SDK or the Galtea dashboard
Upload your scenarios file
Select the file you created and upload it to Galtea
Define Your Product
Start by defining your product in Galtea. These detailed product properties will be used as the foundation for generating realistic scenarios.
Generate Scenarios (Coming Soon)
Galtea will automatically analyze your product definition to create tailored conversation scenarios, complete with:
- Realistic user personas based on your target audience
- Goals aligned with your product’s use cases
- Natural conversation flows that test key features
Conversation Flow Categories
Scenarios can cover various types of conversational interactions:
Task Completion
Task Completion
Information Seeking
Information Seeking
Customer Support
Customer Support
Sales and Consultation
Sales and Consultation
Complex Multi-Step Processes
Complex Multi-Step Processes
Edge Cases and Challenges
Edge Cases and Challenges
Example Scenarios and File Format
Here are examples of scenario content and structure:
Travel Booking Scenarios
Travel Booking Scenarios
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario |
---|---|---|---|---|---|
Book a one-way flight from SFO to JFK for next Tuesday | A busy professional who is direct and values efficiency | I need to book a flight | The user has confirmed the flight booking|The chatbot indicates it cannot fulfill the request | 10 | Flight booking scenario |
Find the cheapest round-trip flight to Europe in summer | A budget-conscious student who asks many questions | Hi, I’m looking for cheap flights to Europe | Flight is booked|User decides not to book|Maximum budget is exceeded | 15 | Budget travel scenario |
Change an existing flight reservation | A frustrated customer whose plans have changed | I need to change my flight immediately | Reservation is successfully modified|Customer is transferred to agent | 8 | Flight modification scenario |
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario |
---|---|---|---|---|---|
Book a one-way flight from SFO to JFK for next Tuesday | A busy professional who is direct and values efficiency | I need to book a flight | The user has confirmed the flight booking|The chatbot indicates it cannot fulfill the request | 10 | Flight booking scenario |
Find the cheapest round-trip flight to Europe in summer | A budget-conscious student who asks many questions | Hi, I’m looking for cheap flights to Europe | Flight is booked|User decides not to book|Maximum budget is exceeded | 15 | Budget travel scenario |
Change an existing flight reservation | A frustrated customer whose plans have changed | I need to change my flight immediately | Reservation is successfully modified|Customer is transferred to agent | 8 | Flight modification scenario |
Customer Support Scenarios
Customer Support Scenarios
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario | ||
---|---|---|---|---|---|---|---|
Get help with a defective product | An upset customer who received a broken item | My product arrived broken and I’m very disappointed | Issue is resolved | Customer requests supervisor | Refund is processed | 12 | Product defect resolution |
Understand how to use a new feature | A curious but non-technical user | I heard about a new feature but don’t know how to use it | User successfully uses the feature | User gives up | Technical support is escalated | 10 | Feature education scenario |
Cancel a subscription | A polite customer who wants to downgrade service | I’d like to cancel my subscription please | Subscription is cancelled | Alternative plan is accepted | Retention offer is declined | 8 | Subscription cancellation |
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario | ||
---|---|---|---|---|---|---|---|
Get help with a defective product | An upset customer who received a broken item | My product arrived broken and I’m very disappointed | Issue is resolved | Customer requests supervisor | Refund is processed | 12 | Product defect resolution |
Understand how to use a new feature | A curious but non-technical user | I heard about a new feature but don’t know how to use it | User successfully uses the feature | User gives up | Technical support is escalated | 10 | Feature education scenario |
Cancel a subscription | A polite customer who wants to downgrade service | I’d like to cancel my subscription please | Subscription is cancelled | Alternative plan is accepted | Retention offer is declined | 8 | Subscription cancellation |
Sales and Consultation Scenarios
Sales and Consultation Scenarios
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario | ||
---|---|---|---|---|---|---|---|
Find the right product for specific needs | A thorough researcher who compares many options | I’m looking for a solution but need help choosing | Product recommendation is accepted | User requests human consultation | User decides to research more | 15 | Product recommendation scenario |
Get pricing information for enterprise solution | A business decision maker focused on ROI | What are your enterprise pricing options? | Quote is requested | Meeting is scheduled | User indicates budget constraints | 10 | Enterprise sales scenario |
Compare different service tiers | An analytical customer who wants detailed comparisons | Can you help me understand the differences between your plans? | Plan is selected | User requests trial | User needs more time to decide | 12 | Service comparison scenario |
Goal | User Persona | Initial Prompt | Stopping Criterias | Max Iterations | Scenario | ||
---|---|---|---|---|---|---|---|
Find the right product for specific needs | A thorough researcher who compares many options | I’m looking for a solution but need help choosing | Product recommendation is accepted | User requests human consultation | User decides to research more | 15 | Product recommendation scenario |
Get pricing information for enterprise solution | A business decision maker focused on ROI | What are your enterprise pricing options? | Quote is requested | Meeting is scheduled | User indicates budget constraints | 10 | Enterprise sales scenario |
Compare different service tiers | An analytical customer who wants detailed comparisons | Can you help me understand the differences between your plans? | Plan is selected | User requests trial | User needs more time to decide | 12 | Service comparison scenario |
This structure is required by Galtea to automatically generate the test cases for your scenarios as tests within the platform. If this format is not provided, you can still manually create the test cases.
For automatic processing, the file format must be CSV
.
The examples provided above are simplified demonstrations. In actual CSV files, scenarios can be much more detailed and the number of test cases (rows) can be significantly higher to provide comprehensive conversation testing coverage.
Structure of Scenarios
Scenarios have a specific structure designed to enable realistic conversation simulation:
The overall objective the synthetic user is trying to achieve during the conversation. Example: “Book a one-way flight from San Francisco (SFO) to New York (JFK) for next Tuesday”
The personality and communication style of the synthetic user that will be simulated throughout the conversation. Example: “A busy professional who is direct and values efficiency. They prefer to get things done quickly without much small talk.”
The first message the synthetic user sends to start the conversation. If not provided, the system will generate an appropriate opening based on the goal and persona. Example: “Hello, I need to book a flight.”
A delimited string of conditions that, if met, will cause the conversation simulation to end. Use ;
or |
as delimiters to separate multiple criteria.
Example: “The user has confirmed the flight booking|The chatbot indicates it cannot fulfill the request;User expresses satisfaction with the service”
The maximum number of conversation turns before the simulation automatically ends. This prevents infinite loops and controls test duration. Example: 10
A brief description of the scenario for documentation and organizational purposes. This helps categorize and manage different types of conversation tests. Example: “Flight booking scenario for business travelers”
Using Scenarios with the Conversation Simulator
Scenarios are specifically designed to work with Galtea’s Conversation Simulator. When you run evaluations using scenarios, the system will:
- Initialize the conversation using the
initial_prompt
anduser_persona
- Generate realistic user responses based on the
goal
and conversation history - Continue the dialogue until one of the
stopping_criterias
is met ormax_iterations
is reached - Evaluate the conversation using your selected metrics to assess performance
For detailed implementation examples, see the Conversation Simulator Tutorial.
Best Practices for Manually Creating Scenarios
Goal Definition
Goal Definition
- Make goals specific and measurable
- Include realistic constraints (time, budget, preferences)
- Vary complexity across different scenarios
- Consider both successful and unsuccessful outcomes
User Persona Creation
User Persona Creation
- Include personality traits that affect communication style
- Specify technical knowledge level and domain expertise
- Define emotional states and motivations
- Consider cultural and contextual factors
Stopping Criteria Design
Stopping Criteria Design
- Include both positive outcomes (goal achieved) and negative outcomes (failure, frustration)
- Account for edge cases where the conversation might stall
- Use multiple criteria connected with ”|” or ”;” to cover various scenarios
- Make criteria specific enough to be clearly identifiable
Scenario Coverage
Scenario Coverage
- Test common user journeys and edge cases
- Include scenarios with different complexity levels
- Cover various user types and use cases
- Test both cooperative and challenging user behaviors