Below are some examples of how to use the SDK to interact with the Galtea platform.
Importing the SDK
from galtea import Galtea
galtea = Galtea(api_key="<YOUR_GALTEA_PLATFORM_API_KEY>")
To initialize the Galtea
class, you need to provide your API key obtained in the settings page of the Galtea platform.
Registering a Product
Registering a product is the first step to start using the Galtea platform.
In this case, we don’t allow to create a product from the sdk, therefore you need to do this from the dashboard.
Creating a Version
With a version, you can track changes to your product over time and compare different implementations.
# Create a version
version = galtea.versions.create(
name="v1.0",
product_id=product.id,
description="Initial version with basic summarization capabilities"
)
More information about creating a version can be found here.
Creating a Test
With a Test you define the test cases to evaluate your product.
You can create quality or red teaming tests depending on your evaluation needs.
# Create a test
test = galtea.tests.create(
name="example-test-tutorial",
type="QUALITY",
product_id=product.id,
ground_truth_file_path="path/to/knowledge_file.pdf", # The ground truth is also known as as the knowledge base.
language='spanish'
)
More information about creating a test can be found here.
Creating a Metric Type
Metric Types help you define the criteria for evaluating the performance of your product.
You can create custom Metric Types tailored to your specific use cases.
# Create a metric type
metric_self_accuracy = galtea.metrics.create(
name="accuracy_v1",
criteria="Determine whether the 'actual output' is equivalent to the 'expected output'."
evaluation_params=["input", "expected output", "actual output"],
)
More information about creating a metric can be found here.
Launching an Evaluation
Evaluations are created implicitly when you run evaluation tasks to assess the performance of a product version against test cases using specific metric types.
# Get test cases from the test
test_cases = galtea.test_cases.list(test_id=test.id)
# Run evaluation tasks for each test case
for test_case in test_cases:
# Retrieve relevant context for RAG. This may not apply to all products.
retrieval_context = your_retriever_function(test_case.input)
# Your product's actual response to the input
actual_output = your_product_function(test_case.input, test_case.context, retrieval_context)
# Run evaluation task (evaluation created implicitly)
galtea.evaluation_tasks.create_single_turn(
version_id=version.id,
test_case_id=test_case.id,
metrics=[metric_self_accuracy.name],
actual_output=actual_output,
retrieval_context=retrieval_context,
)
More information about launching evaluation tasks can be found here.
Retrieving Evaluation Results
Once the evaluation tasks are completed, you can retrieve them to analyze the results.
# First, get all evaluations for the product
evaluations = galtea.evaluations.list(product_id=product.id)
# Iterate through all product evaluations
for evaluation in evaluations:
# Retrieve the evaluation tasks
evaluation_tasks = galtea.evaluation_tasks.list(evaluation_id=evaluation.id)
# Print the evaluation tasks's ID and score
for task in evaluation_tasks:
print(f"Task ID: {task.id}, Score: {task.score}")