Skip to main content
Below are some examples of how to use the SDK to interact with the Galtea platform.

Importing the SDK

from galtea import Galtea

galtea = Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")
To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.

Registering a Product

Registering a product is the first step to start using the Galtea platform. In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.

Creating a Version

With a version, you can track changes to your product over time and compare different implementations.
# Create a version
version = galtea.versions.create(
    name="v1.0",
    product_id=product.id,
    description="Initial version with basic summarization capabilities"
)
More information about creating a version can be found here.

Creating a Test

With a Test you define the test cases to evaluate your product. You can create quality or red teaming tests depending on your evaluation needs.
# Create a test
test = galtea.tests.create(
    name="example-test-tutorial",
    type="QUALITY",
    product_id=product.id,
    ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
    language='spanish'
)
More information about creating a test can be found here.

Creating a Metric

Metrics help you define the criteria for evaluating the performance of your product. You can create custom Metrics tailored to your specific use cases.
# Create a standard metric via API
metric_from_api = galtea.metrics.create(
    name="accuracy_v1",
    test_type="QUALITY",
    evaluator_model_name="GPT-4.1",
    judge_prompt="Determine whether the output is equivalent to the expected output. Output: \"{actual_output}\". Expected Output: \"{expected_output}.\""
)

# Or define a custom metric locally for deterministic checks
from galtea import CustomScoreEvaluationMetric

# First, it needs to be created in the platform
metric_from_api = galtea.metrics.create(
    name="keyword-check",
    description="Checks if the 'actual output' contains the keyword 'expected'.",
)

# Then, you can define your custom metric class
class MyKeywordMetric(CustomScoreEvaluationMetric):
    def __init__(self):
        super().__init__(name="keyword-check")
    def measure(self, *args, actual_output: str | None = None, **kwargs) -> float:
        """
        Returns 1.0 if 'expected' is in actual_output, else 0.0.
        All other args/kwargs are accepted but ignored.
        """
        if actual_output is None:
            return 0.0
        return 1.0 if "expected" in actual_output else 0.0

keyword_metric = MyKeywordMetric()
More information about creating a metric can be found here.

Launching Evaluations

Evaluations assess the performance of a product version against test cases using specific metrics. They are now directly linked to sessions that contain all inference results.
# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)

# Run evaluations for each test case
for test_case in test_cases:
    # Retrieve relevant context for RAG. This may not apply to all products.
    retrieval_context = your_retriever_function(test_case.input)
    
    # Your product's actual response to the input
    actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
    
    # Run evaluation using both standard and custom metrics
    galtea.evaluations.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[
            metric_from_api.name,   # Standard metric by name
            keyword_metric               # Custom metric object
        ],
        actual_output=actual_output,
        retrieval_context=retrieval_context, 
    )
More information about launching evaluations can be found here.

Retrieving Evaluation Results

Once the evaluations are completed, you can retrieve them to analyze the results.
# First, get all sessions for the version
sessions = galtea.sessions.list(version_id=version.id, sort_by_created_at="desc")

# Iterate through all product sessions
for session in sessions:
    # Retrieve the evaluations for this session
    evaluations = galtea.evaluations.list(session_id=session.id)
    # Print the evaluations's ID and score
    for evaluation in evaluations:
        print(f"Evaluation ID: {evaluation.id}, Score: {evaluation.score}")
I