Skip to main content
Galtea is the evaluation platform for AI products. Test accuracy, safety, and behavior — from RAG pipelines to conversational agents to security testing.

Get Started

Integrate the SDK

Install, authenticate, and run your first evaluation in Python.

Run your first evaluation

5-minute quickstart: create a product, run tests, view results.

Understand the platform

Products, specs, tests, metrics, and evaluations — the full model.

How It Works

Galtea helps you evaluate AI products through a repeatable test-measure-iterate cycle:
1

Create Product

Create a Product to represent your AI functionality.
2

Define Specifications

Define Specifications — testable behavioral expectations for your product (capabilities, inabilities, policies).
3

Generate Metrics & Tests

Galtea generates metrics and tests from your specifications, or you create them manually.
4

Create Version

Define a new Version of your product to track changes over time.
5

Run Evaluations

Run Evaluationsevaluations.run() resolves specs, tests, and metrics automatically.
6

Analyze & Iterate

Review results in the Analytics dashboard, then iterate with new versions to track improvements.

Platform Access

You can interact with Galtea through multiple channels:

Web Platform

Manage your products and access insights via the dashboard.

Python SDK

Seamlessly integrate our services using the Python SDK.

GitHub Actions

Automate your workflows by integrating with GitHub Actions.

REST API

Documentation is coming soon.

Core Concepts

Galtea is built around several key concepts that work together to provide comprehensive evaluation of AI products:

Product

A functionality or service being evaluated

Specification

A testable behavioral expectation for a product

Version

A specific iteration of a product

Test

A set of test cases for evaluating product performance

Session

A full conversation between a user and an AI system.

Inference Result

A single turn in a conversation between a user and the AI.

Evaluation

The assessment of an evaluation using a specific metric’s criteria

Metric

Ways to evaluate and score product performance

Model

Way to keep track of your models’ costs