- Typographical errors.
- Optically Character Recognition (OCR) errors.
- Automatic Speech Recognition (ASR) errors.
- Grammatical mistakes.
- Irrelevant or distracting content.
Evaluation Parameters
To compute theresilience_to_noise metric, the following parameters are required in every turn of the conversation:
input: The user message in the conversation, which is assumed to contain some form of noise or irrelevant information.actual_output: The chatbot’s corresponding response. This metric specifically evaluates the model’s ability to handle noisy input, so it is not meaningful to apply it to clean or noise-free data.
How Is It Calculated?
Theresilience_to_noise score is derived using an LLM-as-a-judge approach:
- Noise Robustness Analysis: An LLM is used to analyze the chatbot’s response to noisy input.
- Degradation Assessment: The LLM determines whether the
actual_outputmaintains accuracy and coherence despite the presence of noise in theinput.
This metric is inspired by best practices in the open source community and is implemented natively in the Galtea platform.