The METEOR (Metric for Evaluation of Translation with Explicit ORdering) metric is one of the Deterministic Metric Galtea exposes for evaluating machine translation, summarization, and paraphrasing tasks. It aims to correlate more closely with human judgments compared to BLEU.Documentation Index
Fetch the complete documentation index at: https://docs.galtea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Evaluation Parameters
To compute themeteor metric, the following parameters are required:
actual_output: The model’s generated text.expected_output: The reference (or gold) text to compare against.
How Is It Calculated?
METEOR improves upon BLEU by considering semantic and morphological matches:-
Alignment
Tokens are matched between candidate and reference using:- Exact matches
- Stems (e.g., “run” vs. “running”)
- Synonyms (e.g., “big” vs. “large”)
-
Precision & Recall
Both are calculated from the aligned tokens. -
Fragmentation Penalty
A penalty is applied if matches are scattered (fragmented alignment). -
Final Score
The score is computed as: where F_mean is the harmonic mean of precision and recall.
- ≥ 0.6 – High-quality translation/summary with semantic fidelity.
- 0.3 – 0.6 – Moderate quality; some paraphrasing or structural divergence.
- < 0.3 – Low-quality or semantically incorrect output.
Suggested Test Case Types
Use METEOR when evaluating:- Machine Translation with varied phrasings.
- Abstractive Summarization where synonyms are common.
- Paraphrase Detection with semantic variation.