Evaluates automatic summarization by measuring the longest common subsequence (LCS) that preserves the word order between candidate and reference summaries.
The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric is one of the Deterministic Metric Galtea exposes to evaluate how well a generated summary captures the content of a reference summary. It is primarily used for summarization tasks and other scenarios where recall is more important than exact lexical match.
This implementation uses ROUGE-L, which focuses on the Longest Common Subsequence (LCS) between the candidate and reference:
Longest Common Subsequence
Identifies the longest sequence of words that appears in both candidate and reference (not necessarily contiguous, but in the same order).
Precision & Recall
Precision (P) = LCS length / candidate length
Recall (R) = LCS length / reference length
F1 Score
Combines precision and recall:F1=2⋅R+PP×R
ROUGE-L returns a score between 0 and 1:
≥ 0.5 – Strong overlap with the reference summary.
0.3 – 0.5 – Moderate overlap; acceptable for abstractive summarization.