Below are some examples of how to use the SDK to interact with the Galtea platform.
Importing the SDK
from galtea import Galtea
galtea = Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")
To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.
Registering a Product
Registering a product is the first step to start using the Galtea platform.
In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.
Creating a Version
With a version, you can track changes to your product over time and compare different implementations.
# Create a version
version = galtea.versions.create(
name="v1.0",
product_id=product.id,
description="Initial version with basic summarization capabilities"
)
More information about creating a version can be found here.
Creating a Test
With a Test you define the test cases to evaluate your product.
You can create quality or red teaming tests depending on your evaluation needs.
# Create a test
test = galtea.tests.create(
name="example-test-tutorial",
type="QUALITY",
product_id=product.id,
ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
language='spanish'
)
More information about creating a test can be found here.
Creating a Metric
Metrics help you define the criteria for evaluating the performance of your product.
You can create custom Metrics tailored to your specific use cases.
# Create a standard metric via API
metric_from_api = galtea.metrics.create(
name="accuracy_v1",
test_type="QUALITY",
evaluator_model_name="GPT-4.1",
source="full_prompt",
judge_prompt="Determine whether the output is equivalent to the expected output. Output: \"{actual_output}\". Expected Output: \"{expected_output}.\""
)
# Or define a custom metric locally for deterministic checks
from galtea import CustomScoreEvaluationMetric
# First, it needs to be created in the platform
metric_from_api = galtea.metrics.create(
name="keyword-check",
source="self_hosted",
description="Checks if the 'actual output' contains the keyword 'expected'.",
)
# Then, you can define your custom metric class
class MyKeywordMetric(CustomScoreEvaluationMetric):
def __init__(self):
super().__init__(name="keyword-check")
def measure(self, *args, actual_output: str | None = None, **kwargs) -> float:
"""
Returns 1.0 if 'expected' is in actual_output, else 0.0.
All other args/kwargs are accepted but ignored.
"""
if actual_output is None:
return 0.0
return 1.0 if "expected" in actual_output else 0.0
keyword_metric = MyKeywordMetric()
More information about creating a metric can be found here.
Launching Evaluations
Evaluations assess the performance of a product version against test cases using specific metrics. They are now directly linked to sessions that contain all inference results.
# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)
# Run evaluations for each test case
for test_case in test_cases:
# Retrieve relevant context for RAG. This may not apply to all products.
retrieval_context = your_retriever_function(test_case.input)
# Your product's actual response to the input
actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
# Run evaluation
galtea.evaluations.create_single_turn(
version_id=version.id,
test_case_id=test_case.id,
metrics=[
# Standard Galtea-hosted metric, referenced by name
{"name": metric_from_api.name},
# Self-hosted metric with dynamic scoring
{"score": keyword_metric}
],
actual_output=actual_output,
retrieval_context=retrieval_context,
)
More information about launching evaluations can be found here.
Retrieving Evaluation Results
Once the evaluations are completed, you can retrieve them to analyze the results.
# First, get all sessions for the version
sessions = galtea.sessions.list(version_id=version.id, sort_by_created_at="desc")
# Iterate through all product sessions
for session in sessions:
# Retrieve the evaluations for this session
evaluations = galtea.evaluations.list(session_id=session.id)
# Print the evaluations's ID and score
for evaluation in evaluations:
print(f"Evaluation ID: {evaluation.id}, Score: {evaluation.score}")
When listing resources that may contain many items (products, tests, sessions, evaluations, etc.), the Galtea SDK uses pagination with a default limit of 1,000 items per request.
# Get all test cases for a test using pagination
test_id = "your_test_id"
all_test_cases = []
offset = 0
limit = 100 # Fetch 100 at a time for better performance
while True:
batch = galtea.test_cases.list(
test_id=test_id,
offset=offset,
limit=limit
)
if not batch:
break
all_test_cases.extend(batch)
# If we got fewer results than the limit, we've reached the end
if len(batch) < limit:
break
offset += limit
print(f"Retrieved {len(all_test_cases)} test cases total")
Understanding Pagination Parameters
All list methods in the Galtea SDK accept two pagination parameters:
offset: Number of items to skip before starting to collect results (default: 0)
limit: Maximum number of items to return in a single request (default: 1000)
# Get first 10 products
first_page = galtea.products.list(offset=0, limit=10)
# Get next 10 products
second_page = galtea.products.list(offset=10, limit=10)
# Get default number of products (up to 1000)
products = galtea.products.list() # Returns up to 1000 items
- For large datasets: Use smaller
limit values (e.g., 100) to reduce memory usage and improve response times
- For complete data retrieval: Implement pagination loops as shown in the example above
- For small datasets: If you know there are fewer than 1000 items, you can omit pagination parameters
If you don’t specify a limit and have more than 1,000 items, only the first 1,000 will be returned. Always implement pagination for complete data retrieval when working with large datasets.
The same pagination pattern works for all list operations:
# Paginate through products
products = galtea.products.list(offset=0, limit=50)
# Paginate through versions
versions = galtea.versions.list(product_id="YOUR_PRODUCT_ID", offset=0, limit=50)
# Paginate through sessions
sessions = galtea.sessions.list(version_id="YOUR_VERSION_ID", offset=0, limit=50)
# Paginate through evaluations
evaluations = galtea.evaluations.list(session_id="YOUR_SESSION_ID", offset=0, limit=50)