Changelog
The updates for Galtea
Streamlined Onboarding and Quicker Starts
We’ve revamped the platform onboarding! It’s now more visually intuitive and to help new users get evaluating in no time, we now provide a default Metric and a default Test. This makes it easier than ever to get started with Galtea and run your first evaluation quickly.
Deeper Insights with Visible Conversation Turns
Understanding the full context of interactions is key. You can now view the complete conversation turns associated with your test cases directly within the dashboard. This offers richer context, aiding in more thorough analysis and debugging of your conversational AI products.
Dashboard Usability Boost
We’re continually refining the Galtea experience. This update brings several UI enhancements across the dashboard, designed to improve overall usability and make your workflow smoother and more intuitive.
Tailor Your Test Generation: Selectable Test Case Counts
Gain more control over your testing process! When generating tests, you can now specify the exact number of test cases you want Galtea to create. This allows you to fine-tune the scope and depth of your tests according to your needs.
Track Your Team’s Work: Creator Attribution Displayed
Clarity in collaboration is important. Now, the user who created a Product, Test, Version, or other key assets will be clearly displayed on their respective details pages. This helps in tracking ownership and contributions within your team.
Enhanced Table Functionality for Easier Data Navigation
Working with data tables in the dashboard is now more efficient:
- Clear Filter Indicators: Easily see which filters are currently applied to any table.
- Quick Filter Reset: A new “Clear All Filters” button allows you to reset your view with a single click.
Enjoy these improvements and as always, we welcome your feedback!
New Conversation Evaluation Metrics
You can now evaluate conversations using these new metrics:
- Role Adherence - Assess how well an AI stays within its defined role
- Knowledge Retention - Measure how effectively information is remembered throughout a conversation
- Conversational Completeness - Evaluate whether all user queries were fully addressed
- Conversation Relevancy - Determine if responses remain on-topic and purposeful
Enhanced Security Framework
We’ve significantly improved user access management by implementing an Attribute-Based Access Control (ABAC) strategy, providing more granular control over who can access what within your organization.
Extended Data Generation Capabilities
Our data generation tools have been expanded with:
- Catalan Language Support - Create synthetic data in Catalan to enhance your multilingual applications
- Added support for text-based files - Upload your knowledge base in virtually any text-based format including JSON, HTML, Markdown, and more
Improved Test Creation Experience
We’ve enhanced the clarity of threat selection in the Test Creation form. The selection now displays both the threat and which security frameworks that threat covers, making it easier to align your testing with specific security standards.
Analytics & Navigation Enhancements
- Reduced Clutter in Analytics Filters - Tests and Versions filtering now only display elements that have been used in an evaluation
- Streamlined Task Navigation - Clicking the “input” cell in the evaluation tasks table now navigates directly to the associated Test Case
Bug Fixes & Improvements
We’ve resolved several issues to ensure a smoother experience:
- Fixed a bug that could trigger an infinite loop in the Test Cases List of the dashboard
- Addressed multiple small UI glitches and errors throughout the platform
Enjoy these improvements and as always, we welcome your feedback!
Improvements in Red Teaming Tests
-
New “misuse” threat implemented
Now red teaming incorporates a new threat, misuse, which are queries that not necessaryly malicious however out-of-scope for you specific product. You can now test whether your product can successfully block these queries by marking “Mitre Atlas: Ambiguous prompts” in the threat list. -
Better “data leakage” and “toxicity” tests
The red teaming tests incorporate better your product meta data, to generate the most adequate test cases for “data leakage” and “toxicity”.
Analytics Page Upgrades
We’re continuing to expand the power of the Analytics page! This update introduces:
-
Radar View for Version Comparison
You can now visualize performance across multiple metrics for a single version using the brand-new radar view. It provides a quick way to understand strengths and weaknesses at a glance. -
Smarter Metric Filters
Filters now only show metrics that have actually been used in evaluations—removing unnecessary clutter and making it easier to find relevant data. -
Graph Tooltips
Hovering over truncated names now reveals full labels with tooltips, helping you understand graph contents more clearly.
SDK Safeguards
We’ve added protections to ensure your SDK integration is as smooth and reliable as possible:
-
Version Compatibility Checks
If the SDK version you’re using is not compatible with the current API, it will now throw a clear error to prevent unexpected behavior. -
Update Notifications
When a new SDK version is available, you’ll get a console message with update information—keeping you in the loop without being intrusive.
Bug Fixes
- Metric Range Calculation
Some default metrics were previously displaying inverted scoring scales (e.g., treating 0% as best and 100% as worst). This is now resolved for accurate interpretation. - Test Creation Not Possible Through
.txt
Knowledge Base Files
Due to a recent refactor, the creation of tests using knowledge base files with.txt
extensions was not possible. This has been fixed and you can now create tests using.txt
files as the knowledge base again.
Monitoring Is Live!
Real-world user interactions with your products can now be fully monitored and analyzed. Using the Galtea SDK, you can trigger evaluations in a production environment and view how different versions perform with real users. Read more here.
Improved Galtea Red Teaming Tests
Our simulation-generated tests have been upgraded—delivering higher-quality outcomes. Red teaming tests can now be directed to validate even more specific aspects of various security standards, such as OWASP, MITRE ATLAS, and NIST. Specifically, we have improved jailbreak attacks, in addition to new financial attacks and toxicity prompts.
New Analytics Page
A completely redesigned analytics page is now available! It features:
- Enhanced Filtering Capabilities.
- Improved Data Clarity and Layout.
The new design not only raises the clarity and density of data presentation but also improves your overall user experience.
And with monitoring active, you can see production evaluation results in real time on this page!
User Experience Enhancements
We’re continuously refining the platform based on your feedback. This week’s improvements include:
-
Customizable Evaluation Tasks List:
You can now select which metrics you are interested in, so the evaluation tasks list only shows the ones you need. -
Enhanced Evaluation List Filtering:
Easily filter evaluations by versions, evaluations, tests and test groups. -
Enhanced Test List Filtering:
Easily filter tests by its group. -
Smart Table Sorting:
When you apply a custom sort, the default (usually creation date) is automatically disabled.tional Filters” />
Enjoy the improvements!