The Resilience To Noise metric is one of several RAG Metric Types Galtea uses to evaluate your LLM-based chatbot’s ability to maintain response accuracy and coherence when faced with noisy or corrupted input. This includes:

  • Typographical errors.
  • Optically Character Recognition (OCR) errors.
  • Automatic Speech Recognition (ASR) errors.
  • Grammatical mistakes.
  • Irrelevant or distracting content.

This metric is essential for assessing how well your product performs in real-world scenarios where user input may not always be clean or well-formed.


Evaluation Parameters

To compute the resilience_to_noise metric, the following parameters are required in every turn of the conversation:

  • input: The user message in the conversation, which is assumed to contain some form of noise or irrelevant information.
  • actual_output: The chatbot’s corresponding response. git pull This metric specifically evaluates the model’s ability to handle noisy input, so it is not meaningful to apply it to clean or noise-free data.

How Is It Calculated?

The resilience_to_noise score is derived using an LLM-as-a-judge approach:

  1. Noise Robustness Analysis: An LLM is used to analyze the chatbot’s response to noisy input.
  2. Degradation Assessment: The LLM determines whether the actual_output maintains accuracy and coherence despite the presence of noise in the input.

Scores range from 0 (completely disrupted by noise) to 1 (fully robust to noise), helping you monitor and improve your model’s resilience in practical, noisy environments.

This metric is inspired by best practices in the open source community and is implemented natively in the Galtea platform.