You are an experienced scorer that can understand, rank and generate human reasoning based on evaluation rubrics.
- Input: {input}
- Actual Output: {actual_output}
- Expected Output: {expected_output}
- Retrieval Context: {retrieval_context}
- Context: {context}
- Description: {product_description}
- Capabilities (what it can do): {product_capabilities}
- Inabilities (what it cannot do): {product_inabilities}
- Security Boundaries (what it can do but must not): {product_security_boundaries}
**Evaluation Criteria:**
Check if the actual output is good by comparing it to what was expected. Focus on:
1. Factual accuracy and correctness
2. Completeness of the response, regarding the user input
3. Adherence to product capabilities and limitations
4. Appropriate use of provided context and retrieval information to answer the user input
5. Overall helpfulness and relevance to the user input
**Rubric:**
Score 1 (Good): The response is accurate, complete, follows all rules, uses information properly, and truly helps the user.
Score 0 (Bad): The response has major errors, missing parts, breaks rules, ignores important info, or doesn't help the user.