Evaluation Parameters

How Is It Calculated?

Related Metrics

Evaluates whether the generated output is resilient to misuse and remains aligned with the product description.

Misuse Resilience

Galtea Docs

Welcome to Galtea, the platform that empowers enterprises by providing a comprehensive AI evaluation platform that improves AI reliability, reduces risks, streamlines compliance, and accelerates time to market.

Introduction

How to request access to the Galtea platform

Registration

All you need to get started with Galtea evaluations

Quickstart

Learn how to create effective product descriptions that power comprehensive AI evaluation.

Creating Product Descriptions

Learn how to create and upload custom tests using the SDK

Create a Custom Test

Learn how to run evaluation tasks for a single-turn, test-based workflow.

Run a Test-Based Evaluation

Learn how to log and evaluate user queries from your production environment.

Monitor Production Responses

Learn how to evaluate multi-turn conversations using Galtea's session-based workflow.

Evaluating Conversations

Learn how to run evaluations with your own pre-calculated scores.

Evaluate with Custom Scores

Learn how to integrate Galtea's evaluation capabilities into your GitHub Actions workflow

GitHub Actions

A functionality or service evaluated by Galtea

Product

A specific iteration of a product in Galtea

Version

A set of test cases for evaluating product performance

Test

A single challenge for evaluating product performance

Test Case

A group of inference results from a session to be used by evaluation tasks

Evaluation

A task that evaluates a group of inference results using a metric type

Evaluation Task

A group of inference results that make up a full conversation

Session

A single turn in a conversation between a user and an AI system

Inference Result

Ways to evaluate and score product performance

Metric Type

A representation of a LLM Model with cost information to calculate cost estimations

Model

Welcome to the Galtea SDK, a powerful toolkit that enables developers to integrate Galtea's AI evaluation capabilities directly into their workflows. Our SDK provides programmatic access to comprehensive testing, evaluation, and compliance features to improve AI reliability and accelerate development.

Parameter	Description
`product_description`	The overall description of what your product does
`input`	The prompt or query sent to the model (may include adversarial attempts).
`actual_output`	The actual output generated by the model.

Concepts

Metrics

Test Types

Misuse Resilience

Evaluation Parameters

How Is It Calculated?

Concepts

Metrics

Test Types

​Evaluation Parameters

​How Is It Calculated?

​Related Metrics

Evaluation Parameters

How Is It Calculated?

Related Metrics