Below are some examples of how to use the SDK to interact with the Galtea platform.
Importing the SDK
from galtea import Galtea
# Initialize client (replace with your real API key)
galtea = Galtea(api_key="YOUR_API_KEY")
To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.
Registering a Product
Registering a product is the first step to start using the Galtea platform.
In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.
Creating a Version
With a version, you can track changes to your product over time and compare different implementations.
# 1) Create a version
version = galtea.versions.create(
name="v1.0-" + run_identifier,
product_id=product.id,
description="Initial version with basic summarization capabilities",
)
More information about creating a version can be found here.
Creating a Test
With a Test you define the test cases to evaluate your product.
You can create quality or red teaming tests depending on your evaluation needs.
# 2) Create a test
test = galtea.tests.create(
name="example-test-tutorial-" + run_identifier,
type="ACCURACY",
product_id=product.id,
ground_truth_file_path="path/to/knowledge.md", # The ground truth is the knowledge base
language="english",
)
More information about creating a test can be found here.
Creating a Metric
Metrics help you define the criteria for evaluating the performance of your product.
You can create custom Metrics tailored to your specific use cases.
# 3) Create a standard metric via API
metric_from_api = galtea.metrics.create(
name="accuracy-" + run_identifier,
test_type="ACCURACY",
evaluator_model_name="GPT-4.1",
source="full_prompt",
judge_prompt=(
"Determine whether the output is equivalent to the expected output. "
'Output: "{actual_output}". Expected Output: "{expected_output}."'
),
)
# 4) Create a custom metric entry in the platform, then define local evaluator
metric_from_api_custom = galtea.metrics.create(
name="keyword-check-" + run_identifier,
test_type="ACCURACY",
source="self_hosted",
description="Checks if the 'actual output' contains the keyword 'expected'.",
)
from galtea import CustomScoreEvaluationMetric # noqa: E402
class MyKeywordMetric(CustomScoreEvaluationMetric):
def __init__(self) -> None:
super().__init__(name="keyword-check-" + run_identifier)
def measure(self, *args, actual_output: str | None = None, **kwargs) -> float:
"""
Returns 1.0 if 'expected' is in actual_output, else 0.0.
"""
if actual_output is None:
return 0.0
More information about creating a metric can be found here.
Launching Evaluations
Evaluations assess the performance of a product version against test cases using specific metrics. They are now directly linked to sessions that contain all inference results.
keyword_metric = MyKeywordMetric()
# 5) Retrieve test cases for the test
test_cases = galtea.test_cases.list(test_id=test.id)
# 6) Run evaluations for each test case (placeholders used for retriever & product)
for test_case in test_cases:
# Replace the following with your retrieval and product inference functions
retrieval_context = None # your_retriever_function(test_case.input)
actual_output = (
"This is the expected output" # your_product_function(test_case.input, ...)
)
galtea.evaluations.create_single_turn(
version_id=version.id,
test_case_id=test_case.id,
metrics=[
{"name": metric_from_api.name},
{"score": keyword_metric},
],
actual_output=actual_output,
retrieval_context=retrieval_context,
More information about launching evaluations can be found here.
Retrieving Evaluation Results
Once the evaluations are completed, you can retrieve them to analyze the results.
# 7) List sessions for the version and print evaluations
sessions = galtea.sessions.list(version_id=version.id, sort_by_created_at="desc")
for session in sessions:
evaluations = galtea.evaluations.list(session_id=session.id)
for evaluation in evaluations:
When listing resources that may contain many items (products, tests, sessions, evaluations, etc.), the Galtea SDK uses pagination with a default limit of 1,000 items per request.
# 8) Pagination examples
def fetch_all_test_cases(test_id: str, limit: int = 100) -> list:
all_test_cases = []
offset = 0
while True:
batch = galtea.test_cases.list(test_id=test_id, offset=offset, limit=limit)
if not batch:
break
all_test_cases.extend(batch)
if len(batch) < limit:
break
offset += limit
Understanding Pagination Parameters
All list methods in the Galtea SDK accept two pagination parameters:
offset: Number of items to skip before starting to collect results (default: 0)
limit: Maximum number of items to return in a single request (default: 1000)
# Products/pages examples
first_page_products = galtea.products.list(offset=0, limit=10)
second_page_products = galtea.products.list(offset=10, limit=10)
- For large datasets: Use smaller
limit values (e.g., 100) to reduce memory usage and improve response times
- For complete data retrieval: Implement pagination loops as shown in the example above
- For small datasets: If you know there are fewer than 1000 items, you can omit pagination parameters
If you don’t specify a limit and have more than 1,000 items, only the first 1,000 will be returned. Always implement pagination for complete data retrieval when working with large datasets.
The same pagination pattern works for all list operations:
# Generic pagination examples
products_page = galtea.products.list(offset=0, limit=50)
versions_page = galtea.versions.list(product_id=product.id, offset=0, limit=50)
sessions_page = galtea.sessions.list(version_id=version.id, offset=0, limit=50)
if sessions:
evaluations_page = galtea.evaluations.list(
session_id=sessions[0].id,
offset=0,
limit=50,