Run evaluation with custom metrics
Learn how to create and run evaluations for custom metrics using the SDK
Evaluations allow you to assess how well a specific version of your product performs against a set of test cases by running individual evaluation tasks.
Creating an Evaluation
This is how to create an evaluation:
An evaluation links a specific version of your product to a test. This establishes the framework for running individual evaluation tasks.
Running Evaluation For Custom metrics
Once you’ve created an evaluation, you can run evaluation tasks and directly assign a self-calculated score:
For efficiency, you can process multiple evaluation tasks at once using a loop and the galtea.evaluation_tasks.create
method:
The metrics
parameter specifies which metric types to use for evaluating the task. You can use multiple metrics simultaneously to get different perspectives on performance.
The latency
and usage_info
parameters are optional but highly recommended to be used.
They can be used to track the performance in latency and costs of your product.