Changelog

2025-06-23

Quality Test Generation, Metric Tags, and Product Details

Generate Quality Tests from Examples

You can now create Quality Tests directly from your own examples using the new Few Shots parameter. This makes it easier to tailor tests to your specific use cases and ensure your models are evaluated on the scenarios that matter most. Learn more about test creation.

Metric Type Tags

Metric Types now support tags for easier classification and discovery. Quickly find and organize metrics relevant to your projects. See all metric types.

Enhanced Product Details

Products now include new detail fields:

Capabilities: What your product can do.
Inabilities: Known limitations.
Security Boundaries: Define the security scope and constraints.

These additions help you document and communicate your product’s strengths and boundaries more clearly. Read about product details.

Improved Q&A Generation

Question-answer pairs are now generated with improved accuracy and clarity, thanks to better text filtering and processing.

New Guide: Setting Up Your Product Description

We’ve created a comprehensive guide to help you set up your product descriptions effectively. This guide covers best practices, examples, and tips to ensure your product is presented in the best light. Check it out here

General Improvements

We’ve made various bug fixes and UX/UI improvements across the Dashboard, SDK, and more, making your experience smoother and more reliable.

2025-06-16

Major Platform Overhaul & SDK v2

Major Platform Overhaul

We’ve been hard at work reorganizing and expanding the Galtea platform to handle more use cases and prepare for exciting future features. This release brings significant improvements to the dashboard, SDK, and test generation.

Dashboard Enhancements

Reorganized Version, Test, and Evaluation Task Views:
Detailed views have been streamlined and improved for clearer insights.
New Sessions Visualizations:
Easily organize and navigate conversations through our new Sessions feature.
Evaluations Visualization Removed:
The dashboard now focuses on Sessions and Evaluation Tasks as the primary elements.
Better Filters Across Tables:
Quickly find what you need with improved filtering capabilities on the dashboard.
General Bug Fixes & UX Improvements:
Enjoy smoother interactions, clearer tooltips, and more intuitive code snippets.

SDK v2 Released

The new Galtea SDK v2 is here! It includes breaking changes to simplify workflows and add session support. Check out the migration guide for a smooth transition.

Implicit Evaluation Creation:
No need to explicitly call galtea.evaluations.create()—it happens automatically.
Repurposed evaluation_tasks.create():
The old method is replaced by create_single_turn() for test-based evaluations, while create() now exclusively handles session-based evaluations.
New evaluation_tasks.create_single_turn() Method:
Use this for single-turn test cases. It now requires version_id instead of evaluation_id.
Simplified Version Creation:
The galtea.versions.create() method now accepts all properties directly, no need for an optional_props dictionary.
Sessions Support:
Group multiple inference results under a single session for better multi-turn tracking using galtea.sessions.create().

Improved Test Case Generation

Smarter Test Coverage:
Test cases are now distributed more intelligently across your documents for better coverage based on the number of questions you choose to generate.
Single Threat per Red Teaming Test:
Red Teaming Tests now only allow a single threat type per test, ensuring clearer results.

Enjoy the upgrade!

2025-05-26

Enhanced Test Generation & Streamlined Workflow with Code Snippets

Improved Test Generation

Our test generation capabilities have been significantly upgraded:

Versatile Red Teaming: Red Teaming tests are now more powerful, allowing you to employ multiple attack strategies to thoroughly probe your AI’s defenses.
Better Synthetic Data: We’ve made general improvements to the quality of synthetic data generation, ensuring your tests are more effective and realistic.

Code Snippets Now Available on the Dashboard

We’re making it easier than ever to integrate Galtea into your development process!

Simplified Evaluation Setup: The “Create Evaluation” form on the dashboard has been replaced with a convenient code snippet. Simply copy and paste it directly into your project to get started.
Streamlined Task Creation: Similarly, a new code snippet for “Create Evaluation Task” is now available on the dashboard, simplifying how you send evaluation data to Galtea. You can easily copy and paste this into your project.

Usability Improvements

We’ve also rolled out several usability enhancements based on your feedback:

Enhanced Readability in Tables: Table cells now correctly render line breaks, making it easier to view multi-line content and detailed information at a glance.
Controlled Test Case Generation: To ensure optimal performance and manageability, the maximum number of test cases automatically generated for a single test from a knowledge base is now capped at 1000.

Enjoy these improvements and as always, we welcome your feedback!

2025-05-19

Platform Upgrades: Easier Onboarding, Improved UI & Finer Control

Streamlined Onboarding and Quicker Starts

We’ve revamped the platform onboarding! It’s now more visually intuitive and to help new users get evaluating in no time, we now provide a default Metric and a default Test. This makes it easier than ever to get started with Galtea and run your first evaluation quickly.

Deeper Insights with Visible Conversation Turns

Understanding the full context of interactions is key. You can now view the complete conversation turns associated with your test cases directly within the dashboard. This offers richer context, aiding in more thorough analysis and debugging of your conversational AI products.

Dashboard Usability Boost

We’re continually refining the Galtea experience. This update brings several UI enhancements across the dashboard, designed to improve overall usability and make your workflow smoother and more intuitive.

Tailor Your Test Generation: Selectable Test Case Counts

Gain more control over your testing process! When generating tests, you can now specify the exact number of test cases you want Galtea to create. This allows you to fine-tune the scope and depth of your tests according to your needs.

Track Your Team’s Work: Creator Attribution Displayed

Clarity in collaboration is important. Now, the user who created a Product, Test, Version, or other key assets will be clearly displayed on their respective details pages. This helps in tracking ownership and contributions within your team.

Working with data tables in the dashboard is now more efficient:

Clear Filter Indicators: Easily see which filters are currently applied to any table.
Quick Filter Reset: A new “Clear All Filters” button allows you to reset your view with a single click.

Enjoy these improvements and as always, we welcome your feedback!

2025-05-12

New Conversation Evaluation and Extended Data Generation Capabilities

New Conversation Evaluation Metrics

You can now evaluate conversations using these new metrics:

Role Adherence - Assess how well an AI stays within its defined role
Knowledge Retention - Measure how effectively information is remembered throughout a conversation
Conversation Completeness - Evaluate whether all user queries were fully addressed
Conversation Relevancy - Determine if responses remain on-topic and purposeful

Enhanced Security Framework

We’ve significantly improved user access management by implementing an Attribute-Based Access Control (ABAC) strategy, providing more granular control over who can access what within your organization.

Extended Data Generation Capabilities

Our data generation tools have been expanded with:

Catalan Language Support - Create synthetic data in Catalan to enhance your multilingual applications
Added support for text-based files - Upload your knowledge base in virtually any text-based format including JSON, HTML, Markdown, and more

Improved Test Creation Experience

We’ve enhanced the clarity of threat selection in the Test Creation form. The selection now displays both the threat and which security frameworks that threat covers, making it easier to align your testing with specific security standards.

Reduced Clutter in Analytics Filters - Tests and Versions filtering now only display elements that have been used in an evaluation
Streamlined Task Navigation - Clicking the “input” cell in the evaluation tasks table now navigates directly to the associated Test Case

Bug Fixes & Improvements

We’ve resolved several issues to ensure a smoother experience:

Fixed a bug that could trigger an infinite loop in the Test Cases List of the dashboard
Addressed multiple small UI glitches and errors throughout the platform

Enjoy these improvements and as always, we welcome your feedback!

2025-05-05

Analytics Upgrades and Red Teaming Test Improvements

Improvements in Red Teaming Tests

New “misuse” threat implemented
Now red teaming incorporates a new threat, misuse, which are queries that not necessaryly malicious however out-of-scope for you specific product. You can now test whether your product can successfully block these queries by marking “Mitre Atlas: Ambiguous prompts” in the threat list.
Better “data leakage” and “toxicity” tests
The red teaming tests incorporate better your product meta data, to generate the most adequate test cases for “data leakage” and “toxicity”.

Analytics Page Upgrades

We’re continuing to expand the power of the Analytics page! This update introduces:

Radar View for Version Comparison
You can now visualize performance across multiple metrics for a single version using the brand-new radar view. It provides a quick way to understand strengths and weaknesses at a glance.
Smarter Metric Filters
Filters now only show metrics that have actually been used in evaluations—removing unnecessary clutter and making it easier to find relevant data.
Graph Tooltips
Hovering over truncated names now reveals full labels with tooltips, helping you understand graph contents more clearly.

SDK Safeguards

We’ve added protections to ensure your SDK integration is as smooth and reliable as possible:

Version Compatibility Checks
If the SDK version you’re using is not compatible with the current API, it will now throw a clear error to prevent unexpected behavior.
Update Notifications
When a new SDK version is available, you’ll get a console message with update information—keeping you in the loop without being intrusive.

Bug Fixes

Metric Range Calculation
Some default metrics were previously displaying inverted scoring scales (e.g., treating 0% as best and 100% as worst). This is now resolved for accurate interpretation.
Test Creation Not Possible Through .txt Knowledge Base Files
Due to a recent refactor, the creation of tests using knowledge base files with .txt extensions was not possible. This has been fixed and you can now create tests using .txt files as the knowledge base again.

2025-04-28

Monitorization and UI improvements

Monitoring Is Live!

Real-world user interactions with your products can now be fully monitored and analyzed. Using the Galtea SDK, you can trigger evaluations in a production environment and view how different versions perform with real users. Read more here.

Improved Galtea Red Teaming Tests

Our simulation-generated tests have been upgraded—delivering higher-quality outcomes. Red teaming tests can now be directed to validate even more specific aspects of various security standards, such as OWASP, MITRE ATLAS, and NIST. Specifically, we have improved jailbreak attacks, in addition to new financial attacks and toxicity prompts.

New Analytics Page

A completely redesigned analytics page is now available! It features:

Enhanced Filtering Capabilities.
Improved Data Clarity and Layout.

The new design not only raises the clarity and density of data presentation but also improves your overall user experience.

And with monitoring active, you can see production evaluation results in real time on this page!

User Experience Enhancements

We’re continuously refining the platform based on your feedback. This week’s improvements include:

Customizable Evaluation Tasks List:
You can now select which metrics you are interested in, so the evaluation tasks list only shows the ones you need.
Enhanced Evaluation List Filtering:
Easily filter evaluations by versions, evaluations, tests and test groups.
Enhanced Test List Filtering:
Easily filter tests by its group.
Smart Table Sorting:
When you apply a custom sort, the default (usually creation date) is automatically disabled.tional Filters” />

Enjoy the improvements!

Concepts

Metrics

Test Types

Generate Quality Tests from Examples

Metric Type Tags

Enhanced Product Details

Improved Q&A Generation

New Guide: Setting Up Your Product Description

General Improvements

Major Platform Overhaul

Dashboard Enhancements

SDK v2 Released

Improved Test Case Generation

Improved Test Generation

Code Snippets Now Available on the Dashboard

Usability Improvements

Streamlined Onboarding and Quicker Starts

Deeper Insights with Visible Conversation Turns

Dashboard Usability Boost

Tailor Your Test Generation: Selectable Test Case Counts

Track Your Team’s Work: Creator Attribution Displayed

Enhanced Table Functionality for Easier Data Navigation

New Conversation Evaluation Metrics

Enhanced Security Framework

Extended Data Generation Capabilities

Improved Test Creation Experience

Analytics & Navigation Enhancements

Bug Fixes & Improvements

Improvements in Red Teaming Tests

Analytics Page Upgrades

SDK Safeguards

Bug Fixes

Monitoring Is Live!

Improved Galtea Red Teaming Tests

New Analytics Page

User Experience Enhancements

Concepts

Metrics

Test Types

​Generate Quality Tests from Examples

​Metric Type Tags

​Enhanced Product Details

​Improved Q&A Generation

​New Guide: Setting Up Your Product Description

​General Improvements

​Major Platform Overhaul

​Dashboard Enhancements

​SDK v2 Released

​Improved Test Case Generation

​Improved Test Generation

​Code Snippets Now Available on the Dashboard

​Usability Improvements

​Streamlined Onboarding and Quicker Starts

​Deeper Insights with Visible Conversation Turns

​Dashboard Usability Boost

​Tailor Your Test Generation: Selectable Test Case Counts

​Track Your Team’s Work: Creator Attribution Displayed

​Enhanced Table Functionality for Easier Data Navigation

​New Conversation Evaluation Metrics

​Enhanced Security Framework

​Extended Data Generation Capabilities

​Improved Test Creation Experience

​Analytics & Navigation Enhancements

​Bug Fixes & Improvements

​Improvements in Red Teaming Tests

​Analytics Page Upgrades

​SDK Safeguards

​Bug Fixes

​Monitoring Is Live!

​Improved Galtea Red Teaming Tests

​New Analytics Page

​User Experience Enhancements

Generate Quality Tests from Examples

Metric Type Tags

Enhanced Product Details

Improved Q&A Generation

New Guide: Setting Up Your Product Description

General Improvements

Major Platform Overhaul

Dashboard Enhancements

SDK v2 Released

Improved Test Case Generation

Improved Test Generation

Code Snippets Now Available on the Dashboard

Usability Improvements

Streamlined Onboarding and Quicker Starts

Deeper Insights with Visible Conversation Turns

Dashboard Usability Boost

Tailor Your Test Generation: Selectable Test Case Counts

Track Your Team’s Work: Creator Attribution Displayed

Enhanced Table Functionality for Easier Data Navigation

New Conversation Evaluation Metrics

Enhanced Security Framework

Extended Data Generation Capabilities

Improved Test Creation Experience

Analytics & Navigation Enhancements

Bug Fixes & Improvements

Improvements in Red Teaming Tests

Analytics Page Upgrades

SDK Safeguards

Bug Fixes

Monitoring Is Live!

Improved Galtea Red Teaming Tests

New Analytics Page

User Experience Enhancements