- Role Adherence: Measures how well the AI stays within its defined role.
- Knowledge Retention: Assesses the model’s ability to remember and use information from previous turns.
- Conversation Completeness: Evaluates whether the conversation has reached a natural and informative conclusion.
- Conversation Relevancy: Assesses whether each turn in the conversation is relevant to the ongoing topic.
The Session-Based Workflow
1
Create a Session
A Session acts as a container for all the turns in a single conversation. You create one at the beginning of an interaction.
2
Log Inference Results
Each user input and model output pair is an Inference Result. You can log these turns individually or in a single batch call after the conversation ends. Using a batch call is more efficient.
3
Evaluate the Session
Once the session is logged, you can create evaluation tasks for the entire conversation using the
evaluation_tasks.create()
method.Choose your scenario
Use this when you have test cases. It requires
test_case_id
and is often combined with the Conversation Simulator to generate turns.See the full workflow in Simulating Conversations.
Learn More
Session
A full conversation between a user and an AI system.
Inference Result
A single turn in a conversation between a user and the AI.
Evaluation
A group of evaluable Inference Results from a particular session
Evaluation Task
The assessment of an evaluation using a specific metric type’s criteria
Conversation Simulator
Test your conversational AI by simulating realistic user interactions with a synthetic user