feat: implement KM-RAG methodology artifacts and core architectural standards with supporting query and service updates
This commit is contained in:
@@ -0,0 +1,89 @@
|
||||
# Implementation Patterns for KM-RAG in .NET
|
||||
|
||||
This guide outlines how to implement KM-RAG patterns using C# and .NET, building on existing infrastructures like EF Core and `Microsoft.Extensions.AI`.
|
||||
|
||||
## 1. Defining Knowledge Units
|
||||
Represent units as strongly-typed entities to capture metadata and relationships.
|
||||
|
||||
```csharp
|
||||
public enum KnowledgeUnitType { Section, Table, Definition, Step, Rule }
|
||||
|
||||
public class KnowledgeUnit
|
||||
{
|
||||
public string Id { get; set; } // Stable Hash(Source, Content, Version)
|
||||
public string SourceId { get; set; }
|
||||
public string Version { get; set; }
|
||||
public KnowledgeUnitType Type { get; set; }
|
||||
public string Content { get; set; }
|
||||
public string MetadataJson { get; set; } // page, section_path, etc.
|
||||
public Vector? Embedding { get; set; }
|
||||
|
||||
// Graph Relationships
|
||||
public List<KnowledgeUnitLink> OutgoingLinks { get; set; } = new();
|
||||
}
|
||||
|
||||
public class KnowledgeUnitLink
|
||||
{
|
||||
public string TargetUnitId { get; set; }
|
||||
public string RelationType { get; set; } // "Next", "Defines", "References"
|
||||
}
|
||||
```
|
||||
|
||||
## 2. Multi-Stage Retrieval
|
||||
Transition from simple `Take(Limit)` to a pipeline.
|
||||
|
||||
### Step A: Hybrid Candidate Generation
|
||||
Combine `pgvector` cosine similarity with full-text search if available.
|
||||
|
||||
```csharp
|
||||
var queryVector = await _embeddingGenerator.GenerateAsync(queryText);
|
||||
|
||||
var candidates = await _dbContext.KnowledgeUnits
|
||||
.Where(u => u.TenantId == tenantId)
|
||||
.OrderBy(u => u.Embedding.CosineDistance(queryVector))
|
||||
.Take(20) // Get more candidates for reranking
|
||||
.Select(u => new { u.Id, u.Content, u.Type })
|
||||
.ToListAsync();
|
||||
```
|
||||
|
||||
### Step B: Graph Expansion
|
||||
Retrieve related units to provide full context.
|
||||
|
||||
```csharp
|
||||
// Example: Get "Contextual Neighbors"
|
||||
var expandedIds = await _dbContext.KnowledgeUnitLinks
|
||||
.Where(l => candidateIds.Contains(l.SourceUnitId) && l.RelationType == "ParentSection")
|
||||
.Select(l => l.TargetUnitId)
|
||||
.Distinct()
|
||||
.ToListAsync();
|
||||
|
||||
var contextUnits = await _dbContext.KnowledgeUnits
|
||||
.Where(u => expandedIds.Contains(u.Id))
|
||||
.ToListAsync();
|
||||
```
|
||||
|
||||
## 3. Reranking and Citations
|
||||
Use a model to score the relevance of the expanded context and ensure the LLM cites sources.
|
||||
|
||||
```csharp
|
||||
// System Prompt for Grounded Generation
|
||||
var systemPrompt = @"
|
||||
You are a precision assistant. Answer ONLY using the provided Knowledge Units.
|
||||
If the information is missing, state 'Information not found in knowledge map'.
|
||||
Each answer segment MUST include a citation in format [UnitId].
|
||||
";
|
||||
|
||||
// Response Structure (using System.Text.Json or Structured Outputs)
|
||||
public class RagResponse
|
||||
{
|
||||
public string Answer { get; set; }
|
||||
public List<Citation> Citations { get; set; }
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Ingestion Workflow
|
||||
Instead of `string.Split`, use structural parsers:
|
||||
1. **Parse**: Extract sections/tables (e.g., using `Unstructured` or custom Logic).
|
||||
2. **Normalize**: Assign stable IDs based on content hash + source metadata.
|
||||
3. **Embed**: Generate vectors for the canonical text of each unit.
|
||||
4. **Relate**: Build links (e.g., `prev` -> `curr` -> `next`).
|
||||
Reference in New Issue
Block a user