Evaluates whether your product’s responses are free from harmful, offensive, or toxic language.
non_toxic
metric, the following parameters are required:
input
: The user message, which may include adversarial or edge-case prompts.actual_output
: The LLM’s response to the input.actual_output
contains toxic, harmful, or inappropriate content.