Below are some examples of how to use the SDK to interact with the Galtea platform.

Importing the SDK

from galtea import Galtea

galtea = Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")

To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.

Registering a Product

Registering a product is the first step to start using the Galtea platform. In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.

Creating a Version

With a version, you can track changes to your product over time and compare different implementations.

# Create a version
version = galtea.versions.create(
    name="v1.0",
    product_id=product.id,
    description="Initial version with basic summarization capabilities"
)

More information about creating a version can be found here.

Creating a Test

With a Test you define the test cases to evaluate your product. You can create quality or red teaming tests depending on your evaluation needs.

# Create a test
test = galtea.tests.create(
    name="example-test-tutorial",
    type="QUALITY",
    product_id=product.id,
    ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
    language='spanish'
)

More information about creating a test can be found here.

Creating a Metric Type

Metric Types help you define the criteria for evaluating the performance of your product. You can create custom Metric Types tailored to your specific use cases.

# Create a metric type
metric_self_accuracy = galtea.metrics.create(
    name="accuracy_v1",
    criteria="Determine whether the 'actual output' is equivalent to the 'expected output'."
    evaluation_params=["input", "expected output", "actual output"],
)

More information about creating a metric can be found here.

Launching an Evaluation

Evaluations are created implicitly when you run evaluation tasks to assess the performance of a product version against test cases using specific metric types.

# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)

# Run evaluation tasks for each test case
for test_case in test_cases:
    # Retrieve relevant context for RAG. This may not apply to all products.
    retrieval_context = your_retriever_function(test_case.input)
    
    # Your product's actual response to the input
    actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
    
    # Run evaluation task (evaluation created implicitly)
    galtea.evaluation_tasks.create_single_turn(
        version_id=version.id,
        test_case_id=test_case.id,
        metrics=[metric_self_accuracy.name],
        actual_output=actual_output,
        retrieval_context=retrieval_context, 
    )

More information about launching evaluation tasks can be found here.

Retrieving Evaluation Results

Once the evaluation tasks are completed, you can retrieve them to analyze the results.

# First, get all evaluations for the product
evaluations = galtea.evaluations.list(product_id=product.id)

# Iterate through all product evaluations
for evaluation in evaluations:
  # Retrieve the evaluation tasks
    evaluation_tasks = galtea.evaluation_tasks.list(evaluation_id=evaluation.id)
    # Print the evaluation tasks's ID and score
    for task in evaluation_tasks:
        print(f"Task ID: {task.id}, Score: {task.score}")