Below are some examples of how to use the SDK to interact with the Galtea platform.

Importing the SDK

from galtea import Galtea

galtea = Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")
To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.

Registering a Product

Registering a product is the first step to start using the Galtea platform. In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.

Creating a Version

With a version, you can track changes to your product over time and compare different implementations.
# Create a version
version = galtea.versions.create(
    name="v1.0",
    product_id=product.id,
    description="Initial version with basic summarization capabilities"
)
More information about creating a version can be found here.

Creating a Test

With a Test you define the test cases to evaluate your product. You can create quality or red teaming tests depending on your evaluation needs.
# Create a test
test = galtea.tests.create(
    name="example-test-tutorial",
    type="QUALITY",
    product_id=product.id,
    ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
    language='spanish'
)
More information about creating a test can be found here.

Creating a Metric Type

Metric Types help you define the criteria for evaluating the performance of your product. You can create custom Metric Types tailored to your specific use cases.
# Create a standard metric type via API
metric_type_from_api = galtea.metrics.create(
    name="accuracy_v1",
    evaluator_model_name="GPT-4.1",
    criteria="Determine whether the 'actual output' is equivalent to the 'expected output'."
    evaluation_params=["input", "expected_output", "actual_output"],
)

# Or define a custom metric locally for deterministic checks
from galtea import CustomScoreMetric

# First, it needs to be created in the platform
metric_type_from_api = galtea.metrics.create(
    name="keyword-check",
    description="Checks if the 'actual output' contains the keyword 'expected'.",
)

# Then, you can define your custom metric class
class MyKeywordMetric(CustomScoreMetric):
    def __init__(self):
        super().__init__(name="keyword-check")
    def measure(self, *args, actual_output: str | None = None, **kwargs) -> float:
        """
        Returns 1.0 if 'expected' is in actual_output, else 0.0.
        All other args/kwargs are accepted but ignored.
        """
        if actual_output is None:
            return 0.0
        return 1.0 if "expected" in actual_output else 0.0

keyword_metric = MyKeywordMetric()
More information about creating a metric can be found here.

Launching an Evaluation

Evaluations are created implicitly when you run evaluation tasks to assess the performance of a product version against test cases using specific metric types.
# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)

# Run evaluation tasks for each test case
for test_case in test_cases:
    # Retrieve relevant context for RAG. This may not apply to all products.
    retrieval_context = your_retriever_function(test_case.input)
    
    # Your product's actual response to the input
    actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
    
    # Run evaluation task using both standard and custom metrics
    galtea.evaluation_tasks.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[
            metric_type_from_api.name,   # Standard metric by name
            keyword_metric               # Custom metric object
        ],
        actual_output=actual_output,
        retrieval_context=retrieval_context, 
    )
More information about launching evaluation tasks can be found here.

Retrieving Evaluation Results

Once the evaluation tasks are completed, you can retrieve them to analyze the results.
# First, get all evaluations for the product
evaluations = galtea.evaluations.list(product_id=product.id)

# Iterate through all product evaluations
for evaluation in evaluations:
  # Retrieve the evaluation tasks
    evaluation_tasks = galtea.evaluation_tasks.list(evaluation_id=evaluation.id)
    # Print the evaluation tasks's ID and score
    for task in evaluation_tasks:
        print(f"Task ID: {task.id}, Score: {task.score}")