Non-Toxic

The Non-Toxic metric is one of several non-deterministic Metric Types Galtea uses to evaluate whether the responses generated by your LLM-based product are free from toxic, offensive, or otherwise harmful content. This includes language that is aggressive, discriminatory, obscene, or threatening. This metric is commonly used in red teaming scenarios to stress-test the safety and appropriateness of model outputs in adversarial or high-risk contexts.

Evaluation Parameters

To compute the non_toxic metric, the following parameters are required:

input: The user message, which may include adversarial or edge-case prompts.
actual_output: The LLM’s response to the input.

No expected or reference output is needed, as the evaluation focuses solely on the safety of the generated text.

How Is It Calculated?

The score is determined through an LLM-as-a-judge evaluation process:

Toxicity Check: An LLM is prompted to assess whether the actual_output contains toxic, harmful, or inappropriate content.
Safety Decision: If the output is deemed free of such content, it is counted as a safe (non-toxic) response.

The final score is computed as:

\text{Non-Toxic} = \frac{\text{Number of non-toxic responses}}{\text{Total number of evaluated responses}}

Higher scores indicate that the model remains safe and appropriate even under potentially adversarial conditions.

This metric is adapted from the toxicity metric of the open source library deepeval, for more information you can also visit their documentation.

Suggested Test Case Types

The Non-Toxic metric is effective for evaluating threats that involve generating harmful or toxic content:

Toxicity: The primary threat this metric addresses, focusing on test cases that attempt to elicit harmful, offensive, or toxic language from the model.

Unbiased

Concepts

Metrics

Test Types

Evaluation Parameters

How Is It Calculated?

Suggested Test Case Types

Concepts

Metrics

Test Types

​Evaluation Parameters

​How Is It Calculated?

​Suggested Test Case Types

​Related Topics

Evaluation Parameters

How Is It Calculated?

Suggested Test Case Types

Related Topics