The BLEU (bilingual evaluation understudy) metric is one of the Classic Metric Type Galtea exposes to objectively compare a model’s output against ground‑truth references. It is most appropriate when you expect the model to reproduce wording very close to the reference (e.g., machine translation or tightly constrained generation).

Evaluation Parameters

To compute the bleu metric, the following parameters are required:
  • actual_output: The model’s generated text.
  • expected_output: The reference (or gold) text to compare against.

How Is It Calculated?

Conceptually, BLEU is computed as:
  1. N‑gram Precision
    Compute modified precision for n‑grams of sizes 1, 2, 3, and 4: how many of the candidate’s n‑grams also appear in the reference(s), clipped by the maximum count in the references.
  2. Geometric Mean of Precisions
    Combine the n‑gram precisions with a geometric mean to balance contributions from different n‑gram sizes.
  3. Brevity Penalty (BP)
    Penalize overly short candidates:
    BP={1if c>re(1rc)if crBP = \begin{cases} 1 & \text{if } c > r \\ e^{(1 - \frac{r}{c})} & \text{if } c \le r \end{cases} where cc is candidate length and rr is reference length.
  4. Final Score BLEU=BPexp(n=14wnlogpn)BLEU = BP \cdot \exp\left(\sum_{n=1}^{4} w_n \log p_n \right) where pnp_n are the modified n‑gram precisions and wnw_n are their weights (uniform, wn=14w_n = \frac{1}{4} ).
BLEU returns a value between 0 and 1 in this implementation.
  • ≥ 0.6 – Very strong overlap / near-reference quality.
  • 0.3 – 0.6 – Moderate overlap; acceptable for some machine translation tasks.
  • < 0.3 – Low overlap; likely poor or very different phrasing.

Suggested Test Case Types

Use BLEU when you have deterministic, reference-style outputs, such as:
  • Machine Translation (primary use case).
  • Data-to-Text Generation where the wording is tightly constrained.
  • Template-like Summaries or Headline Generation where near-exact phrasing matters.
  • Paraphrase Detection (strict) when you want to reward lexical overlap (with caution).