Files

90 lines
3.0 KiB
Markdown

# Implementation Patterns for KM-RAG in .NET
This guide outlines how to implement KM-RAG patterns using C# and .NET, building on existing infrastructures like EF Core and `Microsoft.Extensions.AI`.
## 1. Defining Knowledge Units
Represent units as strongly-typed entities to capture metadata and relationships.
```csharp
public enum KnowledgeUnitType { Section, Table, Definition, Step, Rule }
public class KnowledgeUnit
{
public string Id { get; set; } // Stable Hash(Source, Content, Version)
public string SourceId { get; set; }
public string Version { get; set; }
public KnowledgeUnitType Type { get; set; }
public string Content { get; set; }
public string MetadataJson { get; set; } // page, section_path, etc.
public Vector? Embedding { get; set; }
// Graph Relationships
public List<KnowledgeUnitLink> OutgoingLinks { get; set; } = new();
}
public class KnowledgeUnitLink
{
public string TargetUnitId { get; set; }
public string RelationType { get; set; } // "Next", "Defines", "References"
}
```
## 2. Multi-Stage Retrieval
Transition from simple `Take(Limit)` to a pipeline.
### Step A: Hybrid Candidate Generation
Combine `pgvector` cosine similarity with full-text search if available.
```csharp
var queryVector = await _embeddingGenerator.GenerateAsync(queryText);
var candidates = await _dbContext.KnowledgeUnits
.Where(u => u.TenantId == tenantId)
.OrderBy(u => u.Embedding.CosineDistance(queryVector))
.Take(20) // Get more candidates for reranking
.Select(u => new { u.Id, u.Content, u.Type })
.ToListAsync();
```
### Step B: Graph Expansion
Retrieve related units to provide full context.
```csharp
// Example: Get "Contextual Neighbors"
var expandedIds = await _dbContext.KnowledgeUnitLinks
.Where(l => candidateIds.Contains(l.SourceUnitId) && l.RelationType == "ParentSection")
.Select(l => l.TargetUnitId)
.Distinct()
.ToListAsync();
var contextUnits = await _dbContext.KnowledgeUnits
.Where(u => expandedIds.Contains(u.Id))
.ToListAsync();
```
## 3. Reranking and Citations
Use a model to score the relevance of the expanded context and ensure the LLM cites sources.
```csharp
// System Prompt for Grounded Generation
var systemPrompt = @"
You are a precision assistant. Answer ONLY using the provided Knowledge Units.
If the information is missing, state 'Information not found in knowledge map'.
Each answer segment MUST include a citation in format [UnitId].
";
// Response Structure (using System.Text.Json or Structured Outputs)
public class RagResponse
{
public string Answer { get; set; }
public List<Citation> Citations { get; set; }
}
```
## 4. Ingestion Workflow
Instead of `string.Split`, use structural parsers:
1. **Parse**: Extract sections/tables (e.g., using `Unstructured` or custom Logic).
2. **Normalize**: Assign stable IDs based on content hash + source metadata.
3. **Embed**: Generate vectors for the canonical text of each unit.
4. **Relate**: Build links (e.g., `prev` -> `curr` -> `next`).