- Role Adherence: Measures how well the AI stays within its defined role.
- Knowledge Retention: Assesses the model’s ability to remember and use information from previous turns.
- Conversation Completeness: Evaluates whether the conversation has reached a natural and informative conclusion.
- Conversation Relevancy: Assesses whether each turn in the conversation is relevant to the ongoing topic.
The Session-Based Workflow
Create a Session
A Session acts as a container for all the turns in a single conversation. You create one at the beginning of an interaction.
Log Inference Results
Each user input and model output pair is an Inference Result. You can log these turns individually or in a single batch call after the conversation ends. Using a batch call is more efficient.
Choose your scenario
- Test-based evaluation
- Past conversations (offline ingestion)
- Monitoring (production)
Use this when you have test cases. It requires
test_case_id and is often combined with the Conversation Simulator to generate turns.See the full workflow in Simulating Conversations.
Learn More
Session
A full conversation between a user and an AI system.
Inference Result
A single turn in a conversation between a user and the AI.
Evaluation
The assessment of an evaluation using a specific metric’s criteria
Conversation Simulator
Test your conversational AI by simulating realistic user interactions with a synthetic user