Files

1.4 KiB

Quality and Evaluation Checklist

To move from "hope-based RAG" to "controlled RAG", implement these checks.

1. Retrieval Metrics (Search Quality)

  • Context Recall: Are the units necessary to answer the question actually in the retrieved set?
  • Context Precision: Is the retrieved set clean of irrelevant noise?
  • MRR (Mean Reciprocal Rank): Is the most relevant unit appearing at the top?

2. Generation Metrics (Answer Quality)

  • Faithfulness (Groundedness): Can every claim in the answer be traced to a retrieved Knowledge Unit?
  • Answer Relevance: Does the answer actually address the user's intent?
  • Citation Accuracy: Do the citations correctly point to the unit that supports the claim?

3. Governance & Safety

  • ACL Pre-Filtering: Is there a hard check ensuring units from different tenants/roles are NEVER mixed?
  • PII Scanning: Are units scanned for sensitive data during ingestion?
  • Hallucination Gating: Is there a "Confidence Score" or "Low Evidence" flag to warn users?

4. Operational Health

  • Latency Monitoring: Break down time spent in: Embedding -> Vector Search -> Graph Expansion -> Reranking -> LLM.
  • Token Efficiency: Are we sending unnecessary fluff to the LLM, or is the context tightly packed with relevant units?
  • Index Drift: Are we re-evaluating the "Golden Set" of questions when we update embedding models or chunking strategies?