Create a metric type for evaluating your products.
None
if an error occurs.
"GPT-35-turbo"
"GPT-4o"
"GPT-4o-mini"
"GPT-4.1"
"Gemini-2.0-flash"
"Gemini-2.5-Flash"
"Gemini-2.5-Flash-Lite"
"input"
: The original prompt or query sent to the model (always required)."actual_output"
: The output generated by the model."expected_output"
: The ideal or reference answer for the input."context"
: Supplementary context or background information provided to the model."retrieval_context"
: Information retrieved by your RAG pipeline to support inference."product_description"
: High-level description of the product being evaluated."product_capabilities"
: Capabilities or intended functionalities of the product."product_inabilities"
: Known limitations or things the product cannot or should not do."product_security_boundaries"
: Specific boundaries or restrictions to ensure secure behavior.