feat: implement KM-RAG methodology artifacts and core architectural standards with supporting query and service updates
This commit is contained in:
@@ -0,0 +1,23 @@
|
||||
# Quality and Evaluation Checklist
|
||||
|
||||
To move from "hope-based RAG" to "controlled RAG", implement these checks.
|
||||
|
||||
## 1. Retrieval Metrics (Search Quality)
|
||||
- [ ] **Context Recall**: Are the units necessary to answer the question actually in the retrieved set?
|
||||
- [ ] **Context Precision**: Is the retrieved set clean of irrelevant noise?
|
||||
- [ ] **MRR (Mean Reciprocal Rank)**: Is the most relevant unit appearing at the top?
|
||||
|
||||
## 2. Generation Metrics (Answer Quality)
|
||||
- [ ] **Faithfulness (Groundedness)**: Can every claim in the answer be traced to a retrieved Knowledge Unit?
|
||||
- [ ] **Answer Relevance**: Does the answer actually address the user's intent?
|
||||
- [ ] **Citation Accuracy**: Do the citations correctly point to the unit that supports the claim?
|
||||
|
||||
## 3. Governance & Safety
|
||||
- [ ] **ACL Pre-Filtering**: Is there a hard check ensuring units from different tenants/roles are NEVER mixed?
|
||||
- [ ] **PII Scanning**: Are units scanned for sensitive data during ingestion?
|
||||
- [ ] **Hallucination Gating**: Is there a "Confidence Score" or "Low Evidence" flag to warn users?
|
||||
|
||||
## 4. Operational Health
|
||||
- [ ] **Latency Monitoring**: Break down time spent in: Embedding -> Vector Search -> Graph Expansion -> Reranking -> LLM.
|
||||
- [ ] **Token Efficiency**: Are we sending unnecessary fluff to the LLM, or is the context tightly packed with relevant units?
|
||||
- [ ] **Index Drift**: Are we re-evaluating the "Golden Set" of questions when we update embedding models or chunking strategies?
|
||||
Reference in New Issue
Block a user