Get Started
Integrate the SDK
Install, authenticate, and run your first evaluation in Python.
Run your first evaluation
5-minute quickstart: create a product, run tests, view results.
Understand the platform
Products, specs, tests, metrics, and evaluations — the full model.
How It Works
Galtea helps you evaluate AI products through a repeatable test-measure-iterate cycle:Create Product
Create a Product to represent your AI functionality.
Define Specifications
Define Specifications — testable behavioral expectations for your product (capabilities, inabilities, policies).
Generate Metrics & Tests
Galtea generates metrics and tests from your specifications, or you create them manually.
Create Version
Define a new Version of your product to track changes over time.
Run Evaluations
Run Evaluations —
evaluations.run() resolves specs, tests, and metrics automatically.Platform Access
You can interact with Galtea through multiple channels:Web Platform
Manage your products and access insights via the dashboard.
Python SDK
Seamlessly integrate our services using the Python SDK.
GitHub Actions
Automate your workflows by integrating with GitHub Actions.
REST API
Documentation is coming soon.
Core Concepts
Galtea is built around several key concepts that work together to provide comprehensive evaluation of AI products:Product
A functionality or service being evaluated
Specification
A testable behavioral expectation for a product
Version
A specific iteration of a product
Test
A set of test cases for evaluating product performance
Session
A full conversation between a user and an AI system.
Inference Result
A single turn in a conversation between a user and the AI.
Evaluation
The assessment of an evaluation using a specific metric’s criteria
Metric
Ways to evaluate and score product performance
Model
Way to keep track of your models’ costs