feat(search/rag): implement NexusSearchBox, dynamic Qdrant collection auto-provisioning, batch vector ingestion, mobile Serilog logging, and resolve 401 auth handler error (#51)

Resolves #52

This Pull Request introduces the **NexusSearchBox** search feature with premium unified styling, implements a robust **dynamic Qdrant collection auto-provisioning and batch-vector ingestion pipeline**, integrates a unified **Serilog logging infrastructure** for the Blazor Hybrid environment (MAUI), and resolves the **401 Unauthorized API header propagation error** inside mobile builds.

### 🚀 Key Implementations

#### 1. Premium `NexusSearchBox` & Semantic Search UI
* **NexusSearchBox Component:** Created an elegant search-as-you-type search box with smooth key navigation, quick-clearing, and seamless dynamic styling.
* **Unified Aesthetics:** Refactored the search box isolated styling to align perfectly with the dashboard's design system using glassmorphism, `--nexus-neon` token gradients, and smooth pulse/fade animations.
* **Semantic Search Integration:** Integrated semantic search query dispatching (`SearchLibrarySemanticallyQuery`) and wired up navigation seamlessly through the updated `ReaderNavigationService`.
* **Tests Hardening:** Added/adapted query assertions in `QueryTests.cs` to guarantee safe parameterization and error boundary mapping.

#### 2. Qdrant Collection Provisioning & Vector Ingestion
* **Dynamic Auto-Provisioning:** Implemented dynamic checking and lazy-creation of the `knowledge_units` collection using 768 dimensions and Cosine distance.
* **High-Performance Ingestion:** Optimized `ProcessKnowledgeUnitsAsync` with high-performance batch embedding generation using `_embeddingGenerator` and deterministic MD5 GUIDs for stable, duplicate-free upsertion.
* **Database Cache Clear Sync:** Integrated Qdrant collection deletion in `ClearCacheAsync` to ensure absolute consistency between the PostgreSQL database cache and vector database indices.

#### 3. Cross-Platform MAUI Logging (Serilog Infrastructure)
* **Serilog Integration:** Configured cross-platform Serilog routing in `SerilogConfiguration.cs`, streaming diagnostic logs safely across native platforms and the Blazor Webview container.
* **Interop Bridge:** Built `BlazorLoggingBridge.cs` to capture web console messages and pipe them directly to the native host logger.
* **Demo Interface:** Added an interactive `SerilogDemo.razor` sandbox under Pages.

#### 4. Resolving 401 Load Errors (Authentication Handler Flow)
* **Authentication Header Handler:** Implemented the `MobileAuthenticationHeaderHandler` to correctly extract, validate, and inject bearer JWT tokens into outbound API requests.
* **Configuration-based API Host:** Structured standard API URI routing to use clean configuration bindings in `appsettings.json`.

---

### 🧪 Verification & Build Status
* Run `dotnet build` from the solution root: Successfully compiled the full multi-targeted solution (`Liczba błędów: 0`).
* All unit and integration tests successfully executed and verified (`dotnet test`).

---------

Co-authored-by: Marek Jasiński <jasins.marek@gmail.com>
Co-authored-by: Marek Jaisński <jasins.marek@gmail.com>
Reviewed-on: #51
Co-authored-by: Antigravity <antigravity@google.com>
Co-committed-by: Antigravity <antigravity@google.com>
This commit was merged in pull request #51.
This commit is contained in:
2026-05-26 12:15:28 +00:00
committed by Marek Jaisński
parent 758b152a0c
commit a0bf6c15f4
61 changed files with 5111 additions and 570 deletions
@@ -112,6 +112,7 @@ public static class DependencyInjection
services.AddScoped<IKnowledgeService, KnowledgeService>();
services.AddTransient<IEpubReader, EpubReaderService>();
services.AddTransient<IEpubMetadataExtractor, EpubMetadataExtractor>();
services.AddTransient<IEpubExtractor, EpubExtractor>();
// Fix #4: Scoped instead of Singleton — BookStorageService uses path resolution
// that is environment-specific and incompatible with Singleton lifetime in MAUI.
@@ -119,6 +120,7 @@ public static class DependencyInjection
// Fix #1: Ebook repository (scoped, matches AppDbContext lifetime)
services.AddScoped<IEbookRepository, EbookRepository>();
services.AddScoped<IQuizResultRepository, QuizResultRepository>();
// Fix #2: SignalR broadcaster (scoped, wraps IHubContext which is itself a singleton wrapper)
services.AddScoped<ISyncBroadcaster, SignalRSyncBroadcaster>();
@@ -46,6 +46,12 @@ internal sealed class EbookRepository : IEbookRepository
_context.Ebooks.Add(ebook);
}
/// <inheritdoc />
public async Task<Ebook?> FindByIdAsync(Guid id, CancellationToken cancellationToken = default)
{
return await _context.Ebooks.FindAsync(new object[] { id }, cancellationToken);
}
/// <inheritdoc />
public Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
=> _context.SaveChangesAsync(cancellationToken);
@@ -0,0 +1,37 @@
using Microsoft.EntityFrameworkCore;
using NexusReader.Application.Abstractions.Persistence;
using NexusReader.Data.Persistence;
using NexusReader.Domain.Entities;
namespace NexusReader.Infrastructure.Persistence;
/// <summary>
/// EF Core implementation of <see cref="IQuizResultRepository"/>.
/// </summary>
internal sealed class QuizResultRepository : IQuizResultRepository
{
private readonly AppDbContext _context;
public QuizResultRepository(AppDbContext context)
{
_context = context;
}
/// <inheritdoc />
public async Task<NexusUser?> FindUserByIdAsync(string userId, CancellationToken cancellationToken = default)
{
return await _context.Users.FirstOrDefaultAsync(u => u.Id == userId, cancellationToken);
}
/// <inheritdoc />
public void AddQuizResult(QuizResult quizResult)
{
_context.QuizResults.Add(quizResult);
}
/// <inheritdoc />
public Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
{
return _context.SaveChangesAsync(cancellationToken);
}
}
@@ -0,0 +1,85 @@
using System.Text.RegularExpressions;
using FluentResults;
using Microsoft.Extensions.Logging;
using NexusReader.Application.Abstractions.Services;
using VersOne.Epub;
namespace NexusReader.Infrastructure.Services;
public class EpubExtractor : IEpubExtractor
{
private readonly ILogger<EpubExtractor> _logger;
public EpubExtractor(ILogger<EpubExtractor> logger)
{
_logger = logger;
}
public async Task<Result<List<string>>> ExtractChaptersTextAsync(string relativePath, CancellationToken cancellationToken = default)
{
try
{
var fullPath = ResolvePath(relativePath);
if (string.IsNullOrEmpty(fullPath) || !File.Exists(fullPath))
{
_logger.LogError("[EpubExtractor] EPUB file not found at path: {FilePath}", relativePath);
return Result.Fail<List<string>>($"Plik EPUB nie został znaleziony na dysku: {relativePath}");
}
using var bookRef = await EpubReader.OpenBookAsync(fullPath);
var readingOrder = bookRef.GetReadingOrder();
if (readingOrder == null || !readingOrder.Any())
{
return Result.Fail<List<string>>("EPUB nie zawiera czytelnych rozdziałów.");
}
var chapters = new List<string>();
foreach (var chapterRef in readingOrder)
{
if (cancellationToken.IsCancellationRequested)
{
break;
}
var rawContent = await chapterRef.ReadContentAsTextAsync();
var cleanText = StripHtml(rawContent);
chapters.Add(cleanText);
}
return Result.Ok(chapters);
}
catch (Exception ex)
{
_logger.LogError(ex, "[EpubExtractor] Error extracting chapters from EPUB: {FilePath}", relativePath);
return Result.Fail<List<string>>(new Error("Failed to parse and extract text from EPUB").CausedBy(ex));
}
}
private static string? ResolvePath(string relativePath)
{
var normalized = relativePath.Replace('/', Path.DirectorySeparatorChar);
var currentDir = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory);
while (currentDir != null)
{
var candidate = Path.Combine(currentDir.FullName, "wwwroot", normalized);
if (File.Exists(candidate)) return candidate;
var devCandidate = Path.Combine(currentDir.FullName, "src", "NexusReader.Web", "wwwroot", normalized);
if (File.Exists(devCandidate)) return devCandidate;
currentDir = currentDir.Parent;
}
return null;
}
private static string StripHtml(string html)
{
if (string.IsNullOrEmpty(html)) return string.Empty;
var clean = Regex.Replace(html, @"<(style|script)\b[^>]*>.*?</\1>", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
clean = Regex.Replace(clean, @"<[^>]*>", " ");
clean = System.Net.WebUtility.HtmlDecode(clean);
clean = Regex.Replace(clean, @"\s+", " ").Trim();
return clean;
}
}
@@ -15,6 +15,7 @@ using Polly.Registry;
using Microsoft.Extensions.Options;
using NexusReader.Infrastructure.Configuration;
using Qdrant.Client;
using Qdrant.Client.Grpc;
using Neo4j.Driver;
namespace NexusReader.Infrastructure.Services;
@@ -32,8 +33,9 @@ public class KnowledgeService : IKnowledgeService
private readonly ILogger<KnowledgeService> _logger;
private readonly QdrantClient _qdrantClient;
private readonly IDriver _neo4jDriver;
private const string PromptVersion = "1.3";
private const string PromptVersion = "1.7";
private static readonly ConcurrentDictionary<string, Lazy<Task<Result<KnowledgePacket>>>> _activeRequests = new();
private static readonly SemaphoreSlim _collectionSemaphore = new(1, 1);
public KnowledgeService(
IChatClient chatClient,
@@ -84,11 +86,12 @@ public class KnowledgeService : IKnowledgeService
using var dbContext = await _dbContextFactory.CreateDbContextAsync(cancellationToken);
var normalizedText = text.Trim();
var hash = ContentHasher.ComputeHash(normalizedText);
var hashInput = $"{normalizedText}:{traceType}:{PromptVersion}";
var hash = ContentHasher.ComputeHash(hashInput);
// 1. Check Cache
var cached = await dbContext.SemanticKnowledgeCache
.FirstOrDefaultAsync(c => c.ContentHash == hash && c.TenantId == tenantId, cancellationToken);
.FirstOrDefaultAsync(c => c.ContentHash == hash, cancellationToken);
if (cached != null && cached.PromptVersion == PromptVersion)
{
@@ -96,7 +99,12 @@ public class KnowledgeService : IKnowledgeService
try
{
var packet = JsonSerializer.Deserialize<KnowledgePacket>(cached.JsonData, JsonOptions);
if (packet != null) return Result.Ok(packet);
if (packet != null)
{
await ProcessKnowledgeUnitsAsync(packet, tenantId, ebookId, dbContext, cancellationToken);
await dbContext.SaveChangesAsync(cancellationToken);
return Result.Ok(packet);
}
}
catch (JsonException ex)
{
@@ -105,7 +113,7 @@ public class KnowledgeService : IKnowledgeService
}
// Deduplicate concurrent active requests for the exact same hash
var requestKey = $"{tenantId}:{hash}:{traceType}";
var requestKey = $"{hash}:{traceType}";
var lazyTask = _activeRequests.GetOrAdd(requestKey, k =>
new Lazy<Task<Result<KnowledgePacket>>>(
@@ -177,7 +185,7 @@ public class KnowledgeService : IKnowledgeService
// 4. Save to Cache
var cached = await dbContext.SemanticKnowledgeCache
.FirstOrDefaultAsync(c => c.ContentHash == hash && c.TenantId == tenantId);
.FirstOrDefaultAsync(c => c.ContentHash == hash);
var cacheEntry = new SemanticKnowledgeCache
{
@@ -201,7 +209,14 @@ public class KnowledgeService : IKnowledgeService
// 5. Process structured KnowledgeUnits (Graph Expansion)
await ProcessKnowledgeUnitsAsync(knowledgePacket, tenantId, ebookId, dbContext, default);
await dbContext.SaveChangesAsync();
try
{
await dbContext.SaveChangesAsync();
}
catch (DbUpdateException ex) when (ex.InnerException is Npgsql.PostgresException pgEx && pgEx.SqlState == "23505")
{
_logger.LogWarning("[KnowledgeService] Concurrency collision on SemanticKnowledgeCache for {Hash}; another process saved it first. Swallowing.", hash);
}
return Result.Ok(knowledgePacket);
}
catch (JsonException ex)
@@ -224,6 +239,30 @@ public class KnowledgeService : IKnowledgeService
private async Task ProcessKnowledgeUnitsAsync(KnowledgePacket packet, string tenantId, Guid? ebookId, AppDbContext dbContext, CancellationToken cancellationToken)
{
if (packet.Graph != null && (packet.Units == null || !packet.Units.Any()))
{
var graphUnits = packet.Graph.Nodes.Select(node => new KnowledgeUnitDto(
node.Id,
node.Type ?? "concept",
node.Description ?? node.Label,
new Dictionary<string, object>
{
["label"] = node.Label,
["group"] = node.Group,
["summary"] = node.Summary ?? "",
["key_terms"] = node.KeyTerms ?? new List<string>()
}
)).ToList();
var graphLinks = packet.Graph.Links.Select(link => new KnowledgeLinkDto(
link.Source,
link.Target,
link.RelationType
)).ToList();
packet = packet with { Units = graphUnits, Links = graphLinks };
}
var unitIds = packet.Units.Select(u => u.Id).ToList();
var linkSourceIds = packet.Links.Select(l => l.Source).ToList();
var linkTargetIds = packet.Links.Select(l => l.Target).ToList();
@@ -285,6 +324,192 @@ public class KnowledgeService : IKnowledgeService
_logger.LogWarning("[KnowledgeService] Skipping invalid link {Source} -> {Target}: one or both units are missing.", linkDto.Source, linkDto.Target);
}
}
// Generate and upsert vectors to Qdrant in batch
var unitsToEmbed = packet.Units
.Where(u => !string.IsNullOrEmpty(u.Content))
.ToList();
if (unitsToEmbed.Any())
{
try
{
var contents = unitsToEmbed.Select(u => u.Content).ToList();
var embeddingResponse = await _retryPipeline.ExecuteAsync(async ct =>
await _embeddingGenerator.GenerateAsync(
contents,
new EmbeddingGenerationOptions { Dimensions = 768 },
cancellationToken: ct), cancellationToken);
var embeddings = embeddingResponse.ToList();
var points = new List<PointStruct>();
for (int i = 0; i < unitsToEmbed.Count; i++)
{
var unitDto = unitsToEmbed[i];
var vector = embeddings[i].Vector.ToArray();
var point = new PointStruct
{
Id = GetDeterministicGuid(unitDto.Id),
Vectors = vector,
Payload =
{
["content"] = unitDto.Content,
["type"] = unitDto.Type ?? string.Empty,
["tenantId"] = tenantId,
["ebookId"] = ebookId?.ToString() ?? string.Empty,
["metadataJson"] = JsonSerializer.Serialize(unitDto.Metadata)
}
};
points.Add(point);
}
if (points.Any())
{
await EnsureCollectionExistsAsync("knowledge_units", cancellationToken);
await _qdrantClient.UpsertAsync("knowledge_units", points, cancellationToken: cancellationToken);
_logger.LogInformation("[KnowledgeService] Successfully upserted {Count} points to Qdrant collection 'knowledge_units'.", points.Count);
}
}
catch (Exception ex)
{
_logger.LogError(ex, "[KnowledgeService] Failed to generate and upsert embeddings for knowledge units to Qdrant.");
}
}
// 6. Synchronize to Neo4j graph database
await SyncToNeo4jAsync(packet, cancellationToken);
}
private async Task SyncToNeo4jAsync(KnowledgePacket packet, CancellationToken cancellationToken)
{
if (packet.Units == null || !packet.Units.Any()) return;
try
{
await using var session = _neo4jDriver.AsyncSession();
// 1. Merge nodes in a transaction
await session.ExecuteWriteAsync(async tx =>
{
foreach (var unit in packet.Units)
{
var cypher = @"
MERGE (u:KnowledgeUnit {id: $id})
ON CREATE SET u.content = $content, u.type = $type
ON MATCH SET u.content = $content, u.type = $type";
var guidStr = GetDeterministicGuid(unit.Id).ToString();
await tx.RunAsync(cypher, new
{
id = guidStr,
content = unit.Content ?? string.Empty,
type = unit.Type ?? "concept"
});
}
});
// 2. Merge links in a transaction
if (packet.Links != null && packet.Links.Any())
{
await session.ExecuteWriteAsync(async tx =>
{
foreach (var link in packet.Links)
{
if (string.IsNullOrWhiteSpace(link.Source) || string.IsNullOrWhiteSpace(link.Target))
continue;
var relationType = string.IsNullOrWhiteSpace(link.Relation) ? "RELATED_TO" : link.Relation.Trim().ToUpperInvariant();
relationType = System.Text.RegularExpressions.Regex.Replace(relationType, @"[^A-Z0-9_]", "_");
if (string.IsNullOrEmpty(relationType) || relationType == "_")
{
relationType = "RELATED_TO";
}
var cypher = $@"
MATCH (source:KnowledgeUnit {{id: $sourceId}})
MATCH (target:KnowledgeUnit {{id: $targetId}})
MERGE (source)-[r:{relationType}]->(target)";
var sourceGuidStr = GetDeterministicGuid(link.Source).ToString();
var targetGuidStr = GetDeterministicGuid(link.Target).ToString();
await tx.RunAsync(cypher, new
{
sourceId = sourceGuidStr,
targetId = targetGuidStr
});
}
});
}
_logger.LogInformation("[KnowledgeService] Successfully synchronized {NodeCount} nodes and {LinkCount} links to Neo4j.", packet.Units.Count, packet.Links?.Count ?? 0);
}
catch (Exception ex)
{
_logger.LogError(ex, "[KnowledgeService] Failed to synchronize knowledge graph to Neo4j.");
}
}
private async Task EnsureCollectionExistsAsync(string collectionName, CancellationToken cancellationToken = default)
{
await _collectionSemaphore.WaitAsync(cancellationToken);
try
{
var exists = await _qdrantClient.CollectionExistsAsync(collectionName, cancellationToken);
if (!exists)
{
_logger.LogInformation("[KnowledgeService] Creating Qdrant collection '{CollectionName}'...", collectionName);
await _qdrantClient.CreateCollectionAsync(
collectionName: collectionName,
vectorsConfig: new VectorParams
{
Size = 768,
Distance = Distance.Cosine
},
cancellationToken: cancellationToken
);
_logger.LogInformation("[KnowledgeService] Qdrant collection '{CollectionName}' created successfully.", collectionName);
}
}
catch (Exception ex)
{
if (ex.Message.Contains("already exists", StringComparison.OrdinalIgnoreCase) ||
(ex.InnerException != null && ex.InnerException.Message.Contains("already exists", StringComparison.OrdinalIgnoreCase)))
{
_logger.LogInformation("[KnowledgeService] Qdrant collection '{CollectionName}' was already created by another thread.", collectionName);
}
else
{
_logger.LogError(ex, "[KnowledgeService] Error ensuring Qdrant collection '{CollectionName}' exists.", collectionName);
}
}
finally
{
_collectionSemaphore.Release();
}
}
private static Guid GetDeterministicGuid(string input)
{
if (Guid.TryParse(input, out var guid))
{
return guid;
}
using var md5 = System.Security.Cryptography.MD5.Create();
byte[] hash = md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
return new Guid(hash);
}
private static string GetPointIdString(PointId pointId)
{
if (pointId == null) return string.Empty;
return pointId.PointIdOptionsCase == PointId.PointIdOptionsOneofCase.Uuid
? pointId.Uuid
: pointId.Num.ToString();
}
public async Task<Result<GroundednessResult>> VerifyGroundednessAsync(string answer, string context, string tenantId, CancellationToken cancellationToken = default)
@@ -354,6 +579,7 @@ public class KnowledgeService : IKnowledgeService
List<Qdrant.Client.Grpc.ScoredPoint> searchResult;
try
{
await EnsureCollectionExistsAsync("knowledge_units", cancellationToken);
var response = await _qdrantClient.SearchAsync(
collectionName: "knowledge_units",
vector: queryVector,
@@ -363,15 +589,37 @@ public class KnowledgeService : IKnowledgeService
);
searchResult = response.ToList();
}
catch (Exception)
catch (Exception ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Qdrant search failed during GetRelevantContextAsync. Returning empty search results.");
searchResult = new List<Qdrant.Client.Grpc.ScoredPoint>();
}
var contexts = searchResult.Select(point => new RelevantContext
var contexts = searchResult.Select(point =>
{
Text = point.Payload.TryGetValue("content", out var cv) ? cv.StringValue : string.Empty,
Confidence = point.Score
var content = point.Payload.TryGetValue("content", out var cv) ? cv.StringValue : string.Empty;
var summary = string.Empty;
if (point.Payload.TryGetValue("metadataJson", out var metaVal) && !string.IsNullOrEmpty(metaVal.StringValue))
{
try
{
var meta = JsonSerializer.Deserialize<Dictionary<string, object>>(metaVal.StringValue);
if (meta != null && meta.TryGetValue("summary", out var sumObj))
{
summary = sumObj?.ToString();
}
}
catch (JsonException ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to deserialize metadata JSON in RelevantContext mapping.");
}
}
var text = string.IsNullOrEmpty(summary) ? content : $"{content}: {summary}";
return new RelevantContext
{
Text = text,
Confidence = point.Score
};
}).ToList();
return Result.Ok(contexts);
@@ -417,6 +665,7 @@ public class KnowledgeService : IKnowledgeService
List<Qdrant.Client.Grpc.ScoredPoint> searchResult;
try
{
await EnsureCollectionExistsAsync("knowledge_units", cancellationToken);
var response = await _qdrantClient.SearchAsync(
collectionName: "knowledge_units",
vector: queryVector,
@@ -438,7 +687,7 @@ public class KnowledgeService : IKnowledgeService
}
// 3. Graph Expansion via Neo4j
var candidateIds = searchResult.Select(r => r.Id.ToString()).ToList();
var candidateIds = searchResult.Select(r => GetPointIdString(r.Id)).ToList();
var definitions = new Dictionary<string, List<string>>();
if (candidateIds.Any())
@@ -447,7 +696,7 @@ public class KnowledgeService : IKnowledgeService
{
await using var session = _neo4jDriver.AsyncSession();
var cypher = @"
MATCH (source:KnowledgeUnit)-[r:DEFINES]->(target:KnowledgeUnit)
MATCH (source:KnowledgeUnit)-[r]->(target:KnowledgeUnit)
WHERE source.id IN $candidateIds
RETURN source.id AS sourceId, target.content AS targetContent";
@@ -516,12 +765,15 @@ public class KnowledgeService : IKnowledgeService
{
metadata = JsonSerializer.Deserialize<Dictionary<string, object>>(metaVal.StringValue);
}
catch {}
catch (JsonException ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to deserialize metadata JSON in search library mapping.");
}
}
var dto = new SemanticSearchResultDto
{
ContentHash = point.Id.ToString(),
ContentHash = GetPointIdString(point.Id),
Snippet = content,
UnitType = type,
RelevanceScore = point.Score,
@@ -529,7 +781,7 @@ public class KnowledgeService : IKnowledgeService
Metadata = metadata
};
var pointIdStr = point.Id.ToString();
var pointIdStr = GetPointIdString(point.Id);
if (definitions.TryGetValue(pointIdStr, out var pointDefs) && pointDefs.Any())
{
dto.Snippet = $"[Context: {string.Join("; ", pointDefs)}]\n{dto.Snippet}";
@@ -602,6 +854,7 @@ public class KnowledgeService : IKnowledgeService
List<Qdrant.Client.Grpc.ScoredPoint> searchResult;
try
{
await EnsureCollectionExistsAsync("knowledge_units", cancellationToken);
var response = await _qdrantClient.SearchAsync(
collectionName: "knowledge_units",
vector: queryVector,
@@ -627,11 +880,28 @@ public class KnowledgeService : IKnowledgeService
}
// 3. Graph Expansion via Neo4j
var candidateIds = searchResult.Select(r => r.Id.ToString()).ToList();
var candidateIds = searchResult.Select(r => GetPointIdString(r.Id)).ToList();
var relatedContexts = new List<string>();
// Keep map of point ID -> payload data for fast mapping later
var pointMap = searchResult.ToDictionary(r => r.Id.ToString(), r => r);
var pointMap = searchResult.ToDictionary(r => GetPointIdString(r.Id), r => r);
// Fetch knowledge units from PostgreSQL to map Guids back to rich metadata summaries
var guidMap = new Dictionary<string, KnowledgeUnit>();
try
{
using var dbContext = await _dbContextFactory.CreateDbContextAsync(cancellationToken);
var units = await dbContext.KnowledgeUnits
.Include(u => u.Ebook)
.ThenInclude(e => e.Author)
.Where(u => u.TenantId == tenantId && (ebookId == null || u.EbookId == ebookId))
.ToListAsync(cancellationToken);
guidMap = units.ToDictionary(u => GetDeterministicGuid(u.Id).ToString(), u => u);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to load KnowledgeUnits from PostgreSQL for Guid mapping.");
}
if (candidateIds.Any())
{
@@ -641,7 +911,7 @@ public class KnowledgeService : IKnowledgeService
var cypher = @"
MATCH (source:KnowledgeUnit)
WHERE source.id IN $candidateIds
OPTIONAL MATCH (source)-[r:DEFINES|RELATED_TO]->(target:KnowledgeUnit)
OPTIONAL MATCH (source)-[r]->(target:KnowledgeUnit)
RETURN source.id AS sourceId, source.content AS sourceContent,
collect({ targetId: target.id, targetContent: target.content, relation: type(r) }) AS relations";
@@ -654,23 +924,70 @@ public class KnowledgeService : IKnowledgeService
foreach (var record in neoResult)
{
var sourceId = record["sourceId"].As<string>();
var sourceContent = record["sourceContent"].As<string>();
relatedContexts.Add($"[Source ID: {sourceId}] {sourceContent}");
var sourceText = string.Empty;
if (guidMap.TryGetValue(sourceId, out var sourceUnit))
{
var summary = string.Empty;
if (!string.IsNullOrEmpty(sourceUnit.MetadataJson))
{
try
{
var meta = JsonSerializer.Deserialize<Dictionary<string, object>>(sourceUnit.MetadataJson);
if (meta != null && meta.TryGetValue("summary", out var sumObj))
{
summary = sumObj?.ToString();
}
}
catch (JsonException jsonEx)
{
_logger.LogWarning(jsonEx, "[KnowledgeService] Failed to deserialize metadata JSON for unit {UnitId} in AskQuestionAsync source hydration.", sourceUnit.Id);
}
}
sourceText = string.IsNullOrEmpty(summary) ? sourceUnit.Content : $"{sourceUnit.Content}: {summary}";
}
else
{
sourceText = record["sourceContent"].As<string>();
}
relatedContexts.Add($"[Source ID: {sourceId}] {sourceText}");
var relations = record["relations"].As<List<object>>();
if (relations != null)
{
foreach (var relObj in relations)
{
if (relObj is Dictionary<string, object> relDict &&
relDict.TryGetValue("targetId", out var targetIdVal) && targetIdVal is string targetId &&
relDict.TryGetValue("targetContent", out var targetContentVal) && targetContentVal is string targetContent &&
relDict.TryGetValue("relation", out var relationVal) && relationVal is string relation)
if (relObj is System.Collections.IDictionary relDict)
{
if (!string.IsNullOrEmpty(targetContent))
var targetId = relDict["targetId"]?.ToString();
var targetContent = relDict["targetContent"]?.ToString();
var relation = relDict["relation"]?.ToString();
if (!string.IsNullOrEmpty(targetContent) && !string.IsNullOrEmpty(relation))
{
relatedContexts.Add($"[Related Context ({relation}) to {sourceId}] {targetContent}");
var targetText = targetContent;
if (!string.IsNullOrEmpty(targetId) && guidMap.TryGetValue(targetId, out var targetUnit))
{
var summary = string.Empty;
if (!string.IsNullOrEmpty(targetUnit.MetadataJson))
{
try
{
var meta = JsonSerializer.Deserialize<Dictionary<string, object>>(targetUnit.MetadataJson);
if (meta != null && meta.TryGetValue("summary", out var sumObj))
{
summary = sumObj?.ToString();
}
}
catch (JsonException jsonEx)
{
_logger.LogWarning(jsonEx, "[KnowledgeService] Failed to deserialize metadata JSON for unit {UnitId} in AskQuestionAsync target hydration.", targetUnit.Id);
}
}
targetText = string.IsNullOrEmpty(summary) ? targetUnit.Content : $"{targetUnit.Content}: {summary}";
}
relatedContexts.Add($"[Related Context ({relation}) to {sourceId}] {targetText}");
}
}
}
@@ -682,9 +999,35 @@ public class KnowledgeService : IKnowledgeService
_logger.LogWarning(ex, "[KnowledgeService] Neo4j graph expansion failed. Falling back to direct Qdrant points.");
foreach (var point in searchResult)
{
var sourceId = point.Id.ToString();
var content = point.Payload.TryGetValue("content", out var cv) ? cv.StringValue : string.Empty;
relatedContexts.Add($"[Source ID: {sourceId}] {content}");
var sourceId = GetPointIdString(point.Id);
var sourceText = string.Empty;
if (guidMap.TryGetValue(sourceId, out var sourceUnit))
{
var summary = string.Empty;
if (!string.IsNullOrEmpty(sourceUnit.MetadataJson))
{
try
{
var meta = JsonSerializer.Deserialize<Dictionary<string, object>>(sourceUnit.MetadataJson);
if (meta != null && meta.TryGetValue("summary", out var sumObj))
{
summary = sumObj?.ToString();
}
}
catch (JsonException jsonEx)
{
_logger.LogWarning(jsonEx, "[KnowledgeService] Failed to deserialize metadata JSON for unit {UnitId} in fallback AskQuestionAsync.", sourceUnit.Id);
}
}
sourceText = string.IsNullOrEmpty(summary) ? sourceUnit.Content : $"{sourceUnit.Content}: {summary}";
}
else
{
sourceText = point.Payload.TryGetValue("content", out var cv) ? cv.StringValue : string.Empty;
}
relatedContexts.Add($"[Source ID: {sourceId}] {sourceText}");
}
}
}
@@ -708,33 +1051,14 @@ public class KnowledgeService : IKnowledgeService
// 5. Build prompt and invoke Gemini with structured JSON formatting
var contextBlocksText = string.Join("\n\n", relatedContexts);
var systemPrompt = @"
You are an advanced, extremely precise Fact-Checking AI assistant. Your task is to answer the user's question using ONLY the provided context blocks.
Strict Grounding Rules:
1. Rely EXCLUSIVELY on the provided context. Do NOT use any pre-existing external knowledge, facts, or assumptions.
2. If the context does not contain the answer, you must state exactly: 'I cannot answer this based on the provided book context.'
3. For every statement or claim you make in your answer, you must cite the specific source IDs (e.g., source chunk ID or hash) from the context.
4. You must format your response ONLY as a JSON object matching the following structure:
{
""answer"": ""The answer text goes here, referencing [Source ID] as citations."",
""citations"": [
{
""citationId"": ""The exact source ID cited (e.g., chunk hash/ID)"",
""snippet"": ""The precise sentence or phrase from the context that supports this statement."",
""sourceBook"": ""The book title or 'Unknown'""
}
]
}
";
var systemPrompt = PromptRegistry.GroundedRAGSystemPrompt;
var userPrompt = $"Context:\n{contextBlocksText}\n\nQuestion: {question}";
var options = new ChatOptions
{
Temperature = 0.0f,
MaxOutputTokens = 1500,
ResponseFormat = ChatResponseFormat.Json
MaxOutputTokens = 1500
};
var chatResponse = await _retryPipeline.ExecuteAsync(async ct =>
@@ -746,6 +1070,20 @@ Strict Grounding Rules:
var rawJson = chatResponse.Text?.Trim() ?? string.Empty;
rawJson = rawJson.Replace("```json", "").Replace("```", "").Trim();
// Handle direct text fallback when model bypasses JSON format
if (!rawJson.StartsWith("{") &&
(rawJson.Contains("cannot answer", StringComparison.OrdinalIgnoreCase) ||
rawJson.Contains("context does not contain", StringComparison.OrdinalIgnoreCase) ||
rawJson.Contains("provided book context", StringComparison.OrdinalIgnoreCase)))
{
return Result.Ok(new GroundedResponseDto
{
Answer = "I cannot answer this based on the provided book context.",
Citations = new List<CitationDto>()
});
}
rawJson = JsonRepairHelper.Repair(rawJson);
try
@@ -756,15 +1094,42 @@ Strict Grounding Rules:
return Result.Fail("Failed to deserialize grounded RAG response.");
}
// Hydrate book titles for citations if unknown
// Hydrate book titles, author, and page number for citations if unknown
foreach (var citation in groundedResult.Citations)
{
if (pointMap.TryGetValue(citation.CitationId, out var point) &&
point.Payload.TryGetValue("ebookId", out var ev) &&
Guid.TryParse(ev.StringValue, out var ebId) &&
ebookTitles.TryGetValue(ebId, out var title))
Guid.TryParse(ev.StringValue, out var ebId))
{
citation.SourceBook = title;
if (ebookTitles.TryGetValue(ebId, out var title))
{
citation.SourceBook = title;
}
}
// Look up from guidMap to get exact page number and author
if (guidMap.TryGetValue(citation.CitationId, out var unit))
{
if (unit.Ebook?.Author != null)
{
citation.Author = unit.Ebook.Author.Name;
}
if (!string.IsNullOrEmpty(unit.MetadataJson))
{
try
{
var meta = JsonSerializer.Deserialize<Dictionary<string, object>>(unit.MetadataJson);
if (meta != null && meta.TryGetValue("page", out var pageObj) && int.TryParse(pageObj?.ToString(), out var pageVal))
{
citation.PageNumber = pageVal;
}
}
catch (JsonException ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to deserialize metadata JSON for unit {UnitId} in AskQuestionAsync citation mapping.", unit.Id);
}
}
}
}
@@ -790,6 +1155,30 @@ Strict Grounding Rules:
await dbContext.SemanticKnowledgeCache.ExecuteDeleteAsync(cancellationToken);
await dbContext.KnowledgeUnits.ExecuteDeleteAsync(cancellationToken);
await dbContext.KnowledgeUnitLinks.ExecuteDeleteAsync(cancellationToken);
try
{
await _qdrantClient.DeleteCollectionAsync("knowledge_units", cancellationToken: cancellationToken);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to drop Qdrant collection 'knowledge_units' during cache clear.");
}
try
{
await using var session = _neo4jDriver.AsyncSession();
await session.ExecuteWriteAsync(async tx =>
{
await tx.RunAsync("MATCH (n:KnowledgeUnit) DETACH DELETE n");
});
_logger.LogInformation("[KnowledgeService] Successfully wiped Neo4j 'KnowledgeUnit' nodes.");
}
catch (Exception ex)
{
_logger.LogWarning(ex, "[KnowledgeService] Failed to wipe Neo4j graph during cache clear.");
}
return Result.Ok();
}
catch (Exception ex)
@@ -4,9 +4,10 @@ public static class PromptRegistry
{
public const string KnowledgeExtractionSystemPrompt =
"You are an expert educator. Analyze the provided text to extract key concepts, generate relevant quizzes, and construct a knowledge graph. " +
"CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' instead of full sentences). " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. You MUST generate all human-readable fields ('title', 'description', 'question', 'options', 'label') in the EXACT SAME LANGUAGE as the source text. Do NOT translate them to English unless the source text is in English. " +
"CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' or its exact foreign equivalent, never full sentences). " +
"CRITICAL: Extract a MAXIMUM of 15 key concepts/plot points from the text. " +
"CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node (e.g., 'Code Example'). Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
"CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node with the label 'Code Example' translated to the detected language. Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
"CRITICAL: Return ONLY a minified JSON object. Do NOT include markdown formatting like ```json or ```. Do NOT include explanations. " +
"Schema: { " +
"\"concepts\": [ { \"title\": \"string\", \"description\": \"string\" } ], " +
@@ -15,28 +16,66 @@ public static class PromptRegistry
"}.";
public const string GraphExtractionPrompt =
"You are an expert at information architecture. Extract key concepts and paragraph mappings from the text to build a unified knowledge graph. " +
"The input text consists of several paragraphs, each starting with its unique block ID in the format '[ID: seg-X]'. " +
"Extract two types of nodes: " +
"1. Concept Nodes (group: 'concept'): Extract the main technical concepts discussed (e.g., ID: 'dependency-injection', label: 'Dependency Injection'). Max 10 concepts. Labels must be at most 3 words. " +
"2. Block Nodes (group: 'current'): For each paragraph in the input, create a node representing that paragraph where 'id' is the exact block ID (e.g., 'seg-1'), and 'label' is a brief summary of that paragraph's content (max 3 words). " +
"CRITICAL: If a paragraph is a code block, represent it as a single block node with label 'Code Example' (group: 'current'). Do NOT extract low-level code elements (like variables, classes, methods, or namespaces) as separate concept nodes. " +
"CRITICAL: Connect related concept nodes together, and connect each concept node to the block nodes ('seg-X') where it is discussed. " +
"Limit connections to a MAXIMUM of 15 most relevant links. " +
"Return ONLY minified JSON. Schema: { \"graph\": { \"nodes\": [ { \"id\": \"string\", \"label\": \"string\", \"group\": \"concept|current\" } ], \"links\": [ { \"source\": \"string\", \"target\": \"string\", \"value\": 1 } ] } }";
"You are a strict Minimalist Information Architect. Your sole job is to build a high-level, sparse linear backbone for a textbook chapter. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'label', 'summary', and 'key_terms' fields MUST be in the EXACT SAME LANGUAGE as the source text. " +
"The input text consists of sections starting with block IDs (e.g., '[ID: seg-4]'). " +
"CRITICAL TOPOLOGY RULES (ZERO TOLERANCE FOR CLUTTER): " +
"1. HARD NODE LIMIT: You are strictly forbidden from extracting more than 4 to 5 nodes IN TOTAL for the entire text. If there are more sections, select ONLY the 4-5 absolute most critical, high-level structural pillars. " +
"2. NO CONCEPT CLOUDS: Do NOT create nodes for individual technologies, files, terms, or phrases (e.g., 'Kestrel', 'appsettings.json', 'DI', 'Blazor Server' must NEVER be nodes). They must ONLY exist as text strings inside the 'key_terms' array of a major node. " +
"3. LINEAR SPINE PATTERN: Nodes must form a clear, clean path or simple tree representing the chronological reading journey (e.g., Node 1 -> Node 2 -> Node 3). Do NOT create complex web loops or interconnect every node. Limit total links in the entire JSON to maximum 4 or 5 links. " +
"4. NODE DATA STRUCTURE: " +
" - 'id': must be the exact block ID (e.g., 'seg-16'). " +
" - 'label': clear technical title (Max 3 words, e.g., 'Blazor Hosting Models'). " +
" - 'group': strictly either 'bridge' (if it compares legacy vs modern) or 'concept' (for standalone core pillars). " +
" - 'summary': exact 2-sentence distillation for the Contextual Panel. " +
" - 'key_terms': array of max 5 short strings representing the micro-concepts hidden inside this section. " +
"System keys configuration: All JSON keys ('nodes', 'links', 'id', 'label', 'group', 'summary', 'key_terms', 'source', 'target', 'type') must remain strictly in English. " +
"Return ONLY minified JSON. Schema: " +
"{ " +
" \"graph\": { " +
" \"nodes\": [ " +
" { \"id\": \"seg-X\", \"label\": \"string\", \"group\": \"concept|bridge\", \"summary\": \"string\", \"key_terms\": [ \"string\" ] } " +
" ], " +
" \"links\": [ " +
" { \"source\": \"seg-X\", \"target\": \"seg-Y\", \"type\": \"maps_to|contains\" } " +
" ] " +
" } " +
"}";
public const string SummaryAndQuizPrompt =
"You are an expert educator. Provide a concise summary of the text and generate a challenging quiz (3-5 questions). " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The generated 'summary', 'question', and 'options' MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Return ONLY minified JSON. Schema: { \"summary\": \"string\", \"quizzes\": [ { \"question\": \"string\", \"options\": [ \"string\" ], \"correct_index\": 0 } ] }";
public const string KM_ExtractionPrompt =
"You are an expert at Knowledge Engineering. Segment the provided text into discrete Knowledge Units. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'content' field MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Identify 'units' (sections, tables, definitions, rules) and 'links' (how they relate). " +
"CRITICAL: Units must be granular. " +
"CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit. Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
"CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit (translate the name to the detected language). Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
"CRITICAL SYSTEM VALUES: The fields 'type' (strictly: 'Section', 'Table', 'Definition', or 'Rule') and 'relation' (strictly: 'Next', 'Defines', 'Contains', or 'References') are system keys and MUST remain in English as specified. " +
"Schema: { " +
"\"units\": [ { \"id\": \"string\", \"type\": \"Section|Table|Definition|Rule\", \"content\": \"string\", \"metadata\": { \"page\": 0 } } ], " +
"\"links\": [ { \"source\": \"string\", \"target\": \"string\", \"relation\": \"Next|Defines|Contains|References\" } ] " +
"}.";
public const string GroundedRAGSystemPrompt = """
You are an advanced, extremely precise Fact-Checking AI assistant. Your task is to answer the user's question using ONLY the provided context blocks.
Strict Grounding Rules:
1. Rely EXCLUSIVELY on the provided context. Do NOT use any pre-existing external knowledge, facts, or assumptions.
2. If the context does not contain the answer, you must set the "answer" property in the JSON object exactly to: 'I cannot answer this based on the provided book context.' and the "citations" array must be empty.
3. For every statement or claim you make in your answer, you must cite the specific source IDs (e.g., source chunk ID or hash) from the context.
4. You must format your response ONLY as a JSON object matching the following structure:
{
"answer": "The answer text goes here, referencing [Source ID] as citations.",
"citations": [
{
"citationId": "The exact source ID cited (e.g., chunk hash/ID)",
"snippet": "The precise sentence or phrase from the context that supports this statement.",
"sourceBook": "The book title or 'Unknown'"
}
]
}
""";
}