Below are some examples of how to use the SDK to interact with the Galtea platform.

Importing the SDK

from galtea import Galtea

galtea = new Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")

To initialize the Galtea class, you need to provide your API key obtained in the settings page of the Galtea platform.

Registering a Product

Registering a product is the first step to start using the Galtea platform. In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the platform.

Creating a Version

With a version, you can track changes to your product over time and compare different implementations.

# Create a version
version = galtea.versions.create(
    name="v1.0",
    product_id=product.id,
    optional_props={"description": "Initial version with basic summarization capabilities"}
)

More information about creating a version can be found here.

Creating a Test

With a Test you define the test cases to evaluate your product. You can create quality or red teaming tests depending on your evaluation needs.

# Create a test
test = galtea.tests.create(
    name="example-test-tutorial",
    type="QUALITY",
    product_id=product.id,
    ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
    language='spanish'
)

More information about creating a test can be found here.

Creating a Metric Type

Metric Types help you define the criteria for evaluating the performance of your product. You can create custom Metric Types tailored to your specific use cases.

# Create a metric type
metric_self_accuracy = galtea.metrics.create(
    name="accuracy_v1",
    criteria="Determine whether the 'actual output' is equivalent to the 'expected output'."
    evaluation_params=["input", "expected output", "actual output"],
)

More information about creating a metric can be found here.

Launching an Evaluation

Evaluations link a specific version of a product with a test. You can then execute evaluation tasks to assess the performance of the product version against the test cases using a specific metric type criteria.

# Create an evaluation
evaluation = galtea.evaluations.create(
    test_id=test.id,
    version_id=version.id
)

# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)

# Run evaluation tasks for each test case
for test_case in test_cases:
    # Retrieve relevant context for RAG. This may not apply to all products.
    retrieval_context = your_retriever_function(test_case.input)
    
    # Your product's actual response to the input
    actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
    
    # Run evaluation task
    galtea.evaluation_tasks.create(
        metrics=[metric_self_accuracy.name],
        evaluation_id=evaluation.id,
        test_case_id=test_case.id,
        actual_output=actual_output,
        retrieval_context=[retrieval_context], 
    )

More information about creating an evaluation can be found here.

More information about launching an evaluation task can be found here.

Retrieving Evaluation Results

Once the evaluation tasks are completed, you can retrieve them to analyze the results.

# Retrieve evaluation tasks
evaluation_tasks = galtea.evaluation_tasks.list(evaluation.id)
for task in evaluation_tasks:
    evaluation_task = galtea.evaluation_tasks.get(task.id)
    print(evaluation_task.score)