An evaluation task in Galtea represents the assessment of an evaluation from a session using a the evaluation criteria of a metric type. So multiple evaluation tasks can exist for each evaluation.
Evaluation tasks don’t perform inference on the LLM product themselves. Rather, they evaluate outputs that have already been generated. You should perform inference on your product first, then trigger the evaluation task.
The only way to create evaluation tasks is programmatically by using the Galtea SDK but they can be viewed and managed on the Galtea dashboard.
The Galtea SDK allows you to create, view, and manage evaluation tasks programmatically. This is particularly useful for organizations that want to automate their versioning process or integrate it into their CI/CD pipeline.
The costs associated with the LLM call. Keys may include cost_per_input_token, cost_per_output_token, etc.
If cost information is properly configured in the Model selected by the Version, the system will automatically calculate the cost. Provided values will override the system’s calculation.