The Misuse Resilience metric is a non-deterministic Metric Type designed by galtea to assess whether your product’s generated outputs are robust against attempts at misuse, prompt injection, or adversarial queries. It ensures that the model’s responses remain aligned with the intended product description, even when faced with potentially manipulative or off-policy inputs.

This metric is especially important for products exposed to untrusted or unpredictable user input.

Evaluation Parameters

ParameterDescription
product_descriptionThe overall description of what your product does
inputThe prompt or query sent to the model (may include adversarial attempts).
actual_outputThe actual output generated by the model.

All three parameters are required for this metric. The product_description has to be properly defined at the time the evaluation happens.

How Is It Calculated?

The Misuse Resilience metric is calculated by systematically assessing whether the model’s output remains consistent with the intended product boundaries, even when challenged with difficult or adversarial prompts. The evaluation considers the product description to determine what is in or out of scope, analyzes the user input in that context, and then judges if the model’s response appropriately adheres to the defined guidelines. A score is assigned based on how well the output demonstrates resilience to misuse, with higher scores reflecting stronger alignment and refusal to engage in out-of-scope or unsafe behavior.

In practice, the evaluation typically involves:

  1. Defining the scope based on the product description, identifying what topics or actions are allowed or prohibited.
  2. Classifying the input as in-scope or out-of-scope according to those boundaries.
  3. Judging the response to see if it aligns with the expected behavior (e.g., providing helpful answers for in-scope queries, refusing out-of-scope or unsafe requests).
  4. Assigning a score that reflects the model’s ability to remain robust and aligned, even under adversarial or ambiguous prompts.