Factual Accuracy
Evaluates whether the generated output is factually accurate compared to the reference answer.
The Factual Accuracy metric is a non-deterministic Metric Type used by Galtea to evaluate the performance of your AI products, especially in Retrieval-Augmented Generation (RAG) and question answering systems. It measures whether the information in the model’s output is factually correct when compared to a trusted reference answer.
This metric helps ensure that your LLM-generated responses are not only relevant, but also factually accurate, reducing the risk of hallucinations or misinformation in your product’s outputs.
Evaluation Parameters
To compute the factual_accuracy
metric, the following parameters are required:
expected_output
: The reference or ground truth answer that the model’s output should be compared against.actual_output
: The response generated by your LLM application.
How Is It Calculated?
Factual Accuracy is computed using an LLM-as-a-judge process. The LLM compares the actual_output
to the expected_output
and determines whether the generated response is factually correct, complete, and free of hallucinations. The process typically involves:
- Fact Extraction: The LLM identifies key facts or statements in both the
expected_output
andactual_output
. - Fact Comparison: Each fact in the
actual_output
is checked for correctness against theexpected_output
. - Score Calculation: The final score reflects the proportion of facts in the
actual_output
that are accurate and supported by theexpected_output
.
A higher score indicates that the model’s output is more factually aligned with the reference answer.