Monitor production responses
Learn how to create evaluation tasks from user queries in production for deeper analysis
Evaluations allow you to assess how well a specific version of your product performs against real user queries in production by running individual evaluation tasks.
Instantiate the SDK
To use the Galtea SDK, you need to initialize it with your API key. This is typically done at the beginning of your script:
Capturing Production Data for Analysis
You can create evaluation tasks directly from real user queries in your production environment.
SDK Documentation
Learn more about creating evaluation tasks from production data.
All the production data is stored for deeper analysis, allowing you to track performance over time:
By capturing evaluation data from production, you build a valuable repository of real-world performance metrics that can be analyzed later to identify trends, issues, and opportunities for improvement.
The metrics
parameter specifies which metric types to use for evaluating the task. You can use multiple metrics simultaneously to get different perspectives on performance.
The latency
and usage_info
parameters are optional but highly recommended to keep track of the the real performance data of you product.