90 lines
3.0 KiB
Markdown
90 lines
3.0 KiB
Markdown
# Implementation Patterns for KM-RAG in .NET
|
|
|
|
This guide outlines how to implement KM-RAG patterns using C# and .NET, building on existing infrastructures like EF Core and `Microsoft.Extensions.AI`.
|
|
|
|
## 1. Defining Knowledge Units
|
|
Represent units as strongly-typed entities to capture metadata and relationships.
|
|
|
|
```csharp
|
|
public enum KnowledgeUnitType { Section, Table, Definition, Step, Rule }
|
|
|
|
public class KnowledgeUnit
|
|
{
|
|
public string Id { get; set; } // Stable Hash(Source, Content, Version)
|
|
public string SourceId { get; set; }
|
|
public string Version { get; set; }
|
|
public KnowledgeUnitType Type { get; set; }
|
|
public string Content { get; set; }
|
|
public string MetadataJson { get; set; } // page, section_path, etc.
|
|
public Vector? Embedding { get; set; }
|
|
|
|
// Graph Relationships
|
|
public List<KnowledgeUnitLink> OutgoingLinks { get; set; } = new();
|
|
}
|
|
|
|
public class KnowledgeUnitLink
|
|
{
|
|
public string TargetUnitId { get; set; }
|
|
public string RelationType { get; set; } // "Next", "Defines", "References"
|
|
}
|
|
```
|
|
|
|
## 2. Multi-Stage Retrieval
|
|
Transition from simple `Take(Limit)` to a pipeline.
|
|
|
|
### Step A: Hybrid Candidate Generation
|
|
Combine `pgvector` cosine similarity with full-text search if available.
|
|
|
|
```csharp
|
|
var queryVector = await _embeddingGenerator.GenerateAsync(queryText);
|
|
|
|
var candidates = await _dbContext.KnowledgeUnits
|
|
.Where(u => u.TenantId == tenantId)
|
|
.OrderBy(u => u.Embedding.CosineDistance(queryVector))
|
|
.Take(20) // Get more candidates for reranking
|
|
.Select(u => new { u.Id, u.Content, u.Type })
|
|
.ToListAsync();
|
|
```
|
|
|
|
### Step B: Graph Expansion
|
|
Retrieve related units to provide full context.
|
|
|
|
```csharp
|
|
// Example: Get "Contextual Neighbors"
|
|
var expandedIds = await _dbContext.KnowledgeUnitLinks
|
|
.Where(l => candidateIds.Contains(l.SourceUnitId) && l.RelationType == "ParentSection")
|
|
.Select(l => l.TargetUnitId)
|
|
.Distinct()
|
|
.ToListAsync();
|
|
|
|
var contextUnits = await _dbContext.KnowledgeUnits
|
|
.Where(u => expandedIds.Contains(u.Id))
|
|
.ToListAsync();
|
|
```
|
|
|
|
## 3. Reranking and Citations
|
|
Use a model to score the relevance of the expanded context and ensure the LLM cites sources.
|
|
|
|
```csharp
|
|
// System Prompt for Grounded Generation
|
|
var systemPrompt = @"
|
|
You are a precision assistant. Answer ONLY using the provided Knowledge Units.
|
|
If the information is missing, state 'Information not found in knowledge map'.
|
|
Each answer segment MUST include a citation in format [UnitId].
|
|
";
|
|
|
|
// Response Structure (using System.Text.Json or Structured Outputs)
|
|
public class RagResponse
|
|
{
|
|
public string Answer { get; set; }
|
|
public List<Citation> Citations { get; set; }
|
|
}
|
|
```
|
|
|
|
## 4. Ingestion Workflow
|
|
Instead of `string.Split`, use structural parsers:
|
|
1. **Parse**: Extract sections/tables (e.g., using `Unstructured` or custom Logic).
|
|
2. **Normalize**: Assign stable IDs based on content hash + source metadata.
|
|
3. **Embed**: Generate vectors for the canonical text of each unit.
|
|
4. **Relate**: Build links (e.g., `prev` -> `curr` -> `next`).
|