Primarily used for machine translation evaluation, measuring how many n-grams (phrases of n words) in the candidate translation overlap with those in a set of reference translations.
bleu
metric, the following parameters are required:
actual_output
: The model’s generated text.expected_output
: The reference (or gold) text to compare against.