Skip to main content
The Factual Accuracy metric is a non-deterministic Metric used by Galtea to evaluate the performance of your AI products, especially in Retrieval-Augmented Generation (RAG) and question answering systems. It measures whether the information in the model’s output is factually correct, complete, and appropriately addresses the user’s input when compared to a trusted reference answer. This metric helps ensure that your LLM-generated responses are not only relevant and factually accurate, but also comprehensive enough to adequately answer the user’s question, reducing the risk of hallucinations, misinformation, or incomplete responses in your product’s outputs.

Evaluation Parameters

To compute the factual_accuracy metric, the following parameters are required:
  • input: The original user query or question that prompted the response.
  • expected_output: The reference or ground truth answer that the model’s output should be compared against.
  • actual_output: The response generated by your LLM application.

How Is It Calculated?

Factual Accuracy is computed using an LLM-as-a-judge process that evaluates the response on three key dimensions: accuracy, completeness, and relevance, in the following sense:
  1. Accuracy: whether the information is correct and consistent with what is expected.
  2. Completeness: whether all relevant points from the input are properly addressed.
  3. Relevance: whether the content stays focused on the topic without adding unnecessary or unsupported material.
Each of these aspects is carefully verified to ensure the output fully meets the required standards before a final judgment is made. Finally, the final score is determined based on these details:
  • Score 1: Response aligns with the reference answer with no errors, omissions, or unsupported additions
  • Score 0: Response contains factual errors, omits essential information needed for the input, or includes unsupported content
The scoring system provides granular feedback on response quality, allowing you to identify and address different types of factual accuracy issues in your AI system.
This metric is inspired by best practices in the open source community and is implemented natively in the Galtea platform.

Suggested Test Case Types

The Factual Accuracy metric is effective for evaluating quality test cases in Galtea, since it asseses the model’s ability to maintain factual correctness.