Files

3.0 KiB

Implementation Patterns for KM-RAG in .NET

This guide outlines how to implement KM-RAG patterns using C# and .NET, building on existing infrastructures like EF Core and Microsoft.Extensions.AI.

1. Defining Knowledge Units

Represent units as strongly-typed entities to capture metadata and relationships.

public enum KnowledgeUnitType { Section, Table, Definition, Step, Rule }

public class KnowledgeUnit
{
    public string Id { get; set; } // Stable Hash(Source, Content, Version)
    public string SourceId { get; set; }
    public string Version { get; set; }
    public KnowledgeUnitType Type { get; set; }
    public string Content { get; set; }
    public string MetadataJson { get; set; } // page, section_path, etc.
    public Vector? Embedding { get; set; }
    
    // Graph Relationships
    public List<KnowledgeUnitLink> OutgoingLinks { get; set; } = new();
}

public class KnowledgeUnitLink
{
    public string TargetUnitId { get; set; }
    public string RelationType { get; set; } // "Next", "Defines", "References"
}

2. Multi-Stage Retrieval

Transition from simple Take(Limit) to a pipeline.

Step A: Hybrid Candidate Generation

Combine pgvector cosine similarity with full-text search if available.

var queryVector = await _embeddingGenerator.GenerateAsync(queryText);

var candidates = await _dbContext.KnowledgeUnits
    .Where(u => u.TenantId == tenantId)
    .OrderBy(u => u.Embedding.CosineDistance(queryVector))
    .Take(20) // Get more candidates for reranking
    .Select(u => new { u.Id, u.Content, u.Type })
    .ToListAsync();

Step B: Graph Expansion

Retrieve related units to provide full context.

// Example: Get "Contextual Neighbors"
var expandedIds = await _dbContext.KnowledgeUnitLinks
    .Where(l => candidateIds.Contains(l.SourceUnitId) && l.RelationType == "ParentSection")
    .Select(l => l.TargetUnitId)
    .Distinct()
    .ToListAsync();

var contextUnits = await _dbContext.KnowledgeUnits
    .Where(u => expandedIds.Contains(u.Id))
    .ToListAsync();

3. Reranking and Citations

Use a model to score the relevance of the expanded context and ensure the LLM cites sources.

// System Prompt for Grounded Generation
var systemPrompt = @"
You are a precision assistant. Answer ONLY using the provided Knowledge Units.
If the information is missing, state 'Information not found in knowledge map'.
Each answer segment MUST include a citation in format [UnitId].
";

// Response Structure (using System.Text.Json or Structured Outputs)
public class RagResponse
{
    public string Answer { get; set; }
    public List<Citation> Citations { get; set; }
}

4. Ingestion Workflow

Instead of string.Split, use structural parsers:

  1. Parse: Extract sections/tables (e.g., using Unstructured or custom Logic).
  2. Normalize: Assign stable IDs based on content hash + source metadata.
  3. Embed: Generate vectors for the canonical text of each unit.
  4. Relate: Build links (e.g., prev -> curr -> next).