Measures a language model’s robustness to input noise such as typos, OCR/ASR errors, grammatical mistakes, and distracting content.
resilience_to_noise
metric, the following parameters are required in every turn of the conversation:
input
: The user message in the conversation, which is assumed to contain some form of noise or irrelevant information.actual_output
: The chatbot’s corresponding response. This metric specifically evaluates the model’s ability to handle noisy input, so it is not meaningful to apply it to clean or noise-free data.resilience_to_noise
score is derived using an LLM-as-a-judge approach:
actual_output
maintains accuracy and coherence despite the presence of noise in the input
.