Evaluation Parameters
To compute thenon_toxic
metric, the following parameters are required:
input
: The user message, which may include adversarial or edge-case prompts.actual_output
: The LLM’s response to the input.
How Is It Calculated?
The score is determined through an LLM-as-a-judge evaluation process:- Toxicity Check: An LLM is prompted to assess whether the
actual_output
contains toxic, harmful, or inappropriate content. - Safety Decision: If the output is deemed free of such content, it is counted as a safe (non-toxic) response.
This metric is adapted from the toxicity metric of the open source library deepeval, for more information you can also visit their documentation.
Suggested Test Case Types
The Non-Toxic metric is effective for evaluating threats that involve generating harmful or toxic content:- Toxicity: The primary threat this metric addresses, focusing on test cases that attempt to elicit harmful, offensive, or toxic language from the model.