Run Evaluations
Learn how to create and run evaluations using the SDK
Evaluations allow you to assess how well a specific version of your product performs against a set of test cases by running individual evaluation tasks.
Creating an Evaluation
This is how to create an evaluation:
An evaluation links a specific version of your product to a test. This establishes the framework for running individual evaluation tasks.
Running Evaluation Tasks
Once you’ve created an evaluation, you can run evaluation tasks:
For efficiency, you can process multiple evaluation tasks at once using a loop and the galtea.evaluation_tasks.create
method:
The metrics
parameter specifies which metric types to use for evaluating the task. You can use multiple metrics simultaneously to get different perspectives on performance.
The latency
and usage_info
parameters are optional but highly recommended to be used.
They can be used to track the performance in latency and costs of your product.
Customized Test Cases for Evaluation Tasks
If you want to fully customize the Evaluation Task, maybe because your Test File does not follow Galtea’s format thus the test cases have not been created, you can load the test cases from the file and set them in the evaluation task in a similar way:
This method is only recommended for advanced users who need to customize the evaluation tasks. For most cases, using the test cases provided by the test is the best approach.
Using this will limit the acces to analysis tools and metrics that are available when using the standard test cases.
Retrieving Evaluation Results
After running evaluation tasks, you can retrieve the results: