2025-05-19
Platform Upgrades: Easier Onboarding, Improved UI & Finer Control

Streamlined Onboarding and Quicker Starts

We’ve revamped the platform onboarding! It’s now more visually intuitive and to help new users get evaluating in no time, we now provide a default Metric and a default Test. This makes it easier than ever to get started with Galtea and run your first evaluation quickly.

Deeper Insights with Visible Conversation Turns

Understanding the full context of interactions is key. You can now view the complete conversation turns associated with your test cases directly within the dashboard. This offers richer context, aiding in more thorough analysis and debugging of your conversational AI products.

Dashboard Usability Boost

We’re continually refining the Galtea experience. This update brings several UI enhancements across the dashboard, designed to improve overall usability and make your workflow smoother and more intuitive.

Tailor Your Test Generation: Selectable Test Case Counts

Gain more control over your testing process! When generating tests, you can now specify the exact number of test cases you want Galtea to create. This allows you to fine-tune the scope and depth of your tests according to your needs.

Track Your Team’s Work: Creator Attribution Displayed

Clarity in collaboration is important. Now, the user who created a Product, Test, Version, or other key assets will be clearly displayed on their respective details pages. This helps in tracking ownership and contributions within your team.

Enhanced Table Functionality for Easier Data Navigation

Working with data tables in the dashboard is now more efficient:

  • Clear Filter Indicators: Easily see which filters are currently applied to any table.
  • Quick Filter Reset: A new “Clear All Filters” button allows you to reset your view with a single click.

Enjoy these improvements and as always, we welcome your feedback!

2025-05-12
New Conversation Evaluation and Extended Data Generation Capabilities

New Conversation Evaluation Metrics

You can now evaluate conversations using these new metrics:

  • Role Adherence - Assess how well an AI stays within its defined role
  • Knowledge Retention - Measure how effectively information is remembered throughout a conversation
  • Conversational Completeness - Evaluate whether all user queries were fully addressed
  • Conversation Relevancy - Determine if responses remain on-topic and purposeful

Enhanced Security Framework

We’ve significantly improved user access management by implementing an Attribute-Based Access Control (ABAC) strategy, providing more granular control over who can access what within your organization.

Extended Data Generation Capabilities

Our data generation tools have been expanded with:

  • Catalan Language Support - Create synthetic data in Catalan to enhance your multilingual applications
  • Added support for text-based files - Upload your knowledge base in virtually any text-based format including JSON, HTML, Markdown, and more

Improved Test Creation Experience

We’ve enhanced the clarity of threat selection in the Test Creation form. The selection now displays both the threat and which security frameworks that threat covers, making it easier to align your testing with specific security standards.

Analytics & Navigation Enhancements

  • Reduced Clutter in Analytics Filters - Tests and Versions filtering now only display elements that have been used in an evaluation
  • Streamlined Task Navigation - Clicking the “input” cell in the evaluation tasks table now navigates directly to the associated Test Case

Bug Fixes & Improvements

We’ve resolved several issues to ensure a smoother experience:

  • Fixed a bug that could trigger an infinite loop in the Test Cases List of the dashboard
  • Addressed multiple small UI glitches and errors throughout the platform

Enjoy these improvements and as always, we welcome your feedback!

2025-05-05
Analytics Upgrades and Red Teaming Test Improvements

Improvements in Red Teaming Tests

  • New “misuse” threat implemented
    Now red teaming incorporates a new threat, misuse, which are queries that not necessaryly malicious however out-of-scope for you specific product. You can now test whether your product can successfully block these queries by marking “Mitre Atlas: Ambiguous prompts” in the threat list.

  • Better “data leakage” and “toxicity” tests
    The red teaming tests incorporate better your product meta data, to generate the most adequate test cases for “data leakage” and “toxicity”.

Analytics Page Upgrades

We’re continuing to expand the power of the Analytics page! This update introduces:

  • Radar View for Version Comparison
    You can now visualize performance across multiple metrics for a single version using the brand-new radar view. It provides a quick way to understand strengths and weaknesses at a glance.

  • Smarter Metric Filters
    Filters now only show metrics that have actually been used in evaluations—removing unnecessary clutter and making it easier to find relevant data.

  • Graph Tooltips
    Hovering over truncated names now reveals full labels with tooltips, helping you understand graph contents more clearly.

SDK Safeguards

We’ve added protections to ensure your SDK integration is as smooth and reliable as possible:

  • Version Compatibility Checks
    If the SDK version you’re using is not compatible with the current API, it will now throw a clear error to prevent unexpected behavior.

  • Update Notifications
    When a new SDK version is available, you’ll get a console message with update information—keeping you in the loop without being intrusive.

Bug Fixes

  • Metric Range Calculation
    Some default metrics were previously displaying inverted scoring scales (e.g., treating 0% as best and 100% as worst). This is now resolved for accurate interpretation.
  • Test Creation Not Possible Through .txt Knowledge Base Files
    Due to a recent refactor, the creation of tests using knowledge base files with .txt extensions was not possible. This has been fixed and you can now create tests using .txt files as the knowledge base again.
2025-04-28
Monitorization and UI improvements

Monitoring Is Live!

Real-world user interactions with your products can now be fully monitored and analyzed. Using the Galtea SDK, you can trigger evaluations in a production environment and view how different versions perform with real users. Read more here.

Improved Galtea Red Teaming Tests

Our simulation-generated tests have been upgraded—delivering higher-quality outcomes. Red teaming tests can now be directed to validate even more specific aspects of various security standards, such as OWASP, MITRE ATLAS, and NIST. Specifically, we have improved jailbreak attacks, in addition to new financial attacks and toxicity prompts.

New Analytics Page

A completely redesigned analytics page is now available! It features:

  • Enhanced Filtering Capabilities.
  • Improved Data Clarity and Layout.

The new design not only raises the clarity and density of data presentation but also improves your overall user experience.

And with monitoring active, you can see production evaluation results in real time on this page!

User Experience Enhancements

We’re continuously refining the platform based on your feedback. This week’s improvements include:

  • Customizable Evaluation Tasks List:
    You can now select which metrics you are interested in, so the evaluation tasks list only shows the ones you need.

  • Enhanced Evaluation List Filtering:
    Easily filter evaluations by versions, evaluations, tests and test groups.

  • Enhanced Test List Filtering:
    Easily filter tests by its group.

  • Smart Table Sorting:
    When you apply a custom sort, the default (usually creation date) is automatically disabled.tional Filters” />

Enjoy the improvements!