feat(search/rag): implement NexusSearchBox, dynamic Qdrant collection auto-provisioning, batch vector ingestion, mobile Serilog logging, and resolve 401 auth handler error (#51)

Resolves #52

This Pull Request introduces the **NexusSearchBox** search feature with premium unified styling, implements a robust **dynamic Qdrant collection auto-provisioning and batch-vector ingestion pipeline**, integrates a unified **Serilog logging infrastructure** for the Blazor Hybrid environment (MAUI), and resolves the **401 Unauthorized API header propagation error** inside mobile builds.

### 🚀 Key Implementations

#### 1. Premium `NexusSearchBox` & Semantic Search UI
* **NexusSearchBox Component:** Created an elegant search-as-you-type search box with smooth key navigation, quick-clearing, and seamless dynamic styling.
* **Unified Aesthetics:** Refactored the search box isolated styling to align perfectly with the dashboard's design system using glassmorphism, `--nexus-neon` token gradients, and smooth pulse/fade animations.
* **Semantic Search Integration:** Integrated semantic search query dispatching (`SearchLibrarySemanticallyQuery`) and wired up navigation seamlessly through the updated `ReaderNavigationService`.
* **Tests Hardening:** Added/adapted query assertions in `QueryTests.cs` to guarantee safe parameterization and error boundary mapping.

#### 2. Qdrant Collection Provisioning & Vector Ingestion
* **Dynamic Auto-Provisioning:** Implemented dynamic checking and lazy-creation of the `knowledge_units` collection using 768 dimensions and Cosine distance.
* **High-Performance Ingestion:** Optimized `ProcessKnowledgeUnitsAsync` with high-performance batch embedding generation using `_embeddingGenerator` and deterministic MD5 GUIDs for stable, duplicate-free upsertion.
* **Database Cache Clear Sync:** Integrated Qdrant collection deletion in `ClearCacheAsync` to ensure absolute consistency between the PostgreSQL database cache and vector database indices.

#### 3. Cross-Platform MAUI Logging (Serilog Infrastructure)
* **Serilog Integration:** Configured cross-platform Serilog routing in `SerilogConfiguration.cs`, streaming diagnostic logs safely across native platforms and the Blazor Webview container.
* **Interop Bridge:** Built `BlazorLoggingBridge.cs` to capture web console messages and pipe them directly to the native host logger.
* **Demo Interface:** Added an interactive `SerilogDemo.razor` sandbox under Pages.

#### 4. Resolving 401 Load Errors (Authentication Handler Flow)
* **Authentication Header Handler:** Implemented the `MobileAuthenticationHeaderHandler` to correctly extract, validate, and inject bearer JWT tokens into outbound API requests.
* **Configuration-based API Host:** Structured standard API URI routing to use clean configuration bindings in `appsettings.json`.

---

### 🧪 Verification & Build Status
* Run `dotnet build` from the solution root: Successfully compiled the full multi-targeted solution (`Liczba błędów: 0`).
* All unit and integration tests successfully executed and verified (`dotnet test`).

---------

Co-authored-by: Marek Jasiński <jasins.marek@gmail.com>
Co-authored-by: Marek Jaisński <jasins.marek@gmail.com>
Reviewed-on: #51
Co-authored-by: Antigravity <antigravity@google.com>
Co-committed-by: Antigravity <antigravity@google.com>
This commit was merged in pull request #51.
This commit is contained in:
2026-05-26 12:15:28 +00:00
committed by Marek Jaisński
parent 758b152a0c
commit a0bf6c15f4
61 changed files with 5111 additions and 570 deletions
@@ -4,9 +4,10 @@ public static class PromptRegistry
{
public const string KnowledgeExtractionSystemPrompt =
"You are an expert educator. Analyze the provided text to extract key concepts, generate relevant quizzes, and construct a knowledge graph. " +
"CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' instead of full sentences). " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. You MUST generate all human-readable fields ('title', 'description', 'question', 'options', 'label') in the EXACT SAME LANGUAGE as the source text. Do NOT translate them to English unless the source text is in English. " +
"CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' or its exact foreign equivalent, never full sentences). " +
"CRITICAL: Extract a MAXIMUM of 15 key concepts/plot points from the text. " +
"CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node (e.g., 'Code Example'). Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
"CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node with the label 'Code Example' translated to the detected language. Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
"CRITICAL: Return ONLY a minified JSON object. Do NOT include markdown formatting like ```json or ```. Do NOT include explanations. " +
"Schema: { " +
"\"concepts\": [ { \"title\": \"string\", \"description\": \"string\" } ], " +
@@ -15,28 +16,66 @@ public static class PromptRegistry
"}.";
public const string GraphExtractionPrompt =
"You are an expert at information architecture. Extract key concepts and paragraph mappings from the text to build a unified knowledge graph. " +
"The input text consists of several paragraphs, each starting with its unique block ID in the format '[ID: seg-X]'. " +
"Extract two types of nodes: " +
"1. Concept Nodes (group: 'concept'): Extract the main technical concepts discussed (e.g., ID: 'dependency-injection', label: 'Dependency Injection'). Max 10 concepts. Labels must be at most 3 words. " +
"2. Block Nodes (group: 'current'): For each paragraph in the input, create a node representing that paragraph where 'id' is the exact block ID (e.g., 'seg-1'), and 'label' is a brief summary of that paragraph's content (max 3 words). " +
"CRITICAL: If a paragraph is a code block, represent it as a single block node with label 'Code Example' (group: 'current'). Do NOT extract low-level code elements (like variables, classes, methods, or namespaces) as separate concept nodes. " +
"CRITICAL: Connect related concept nodes together, and connect each concept node to the block nodes ('seg-X') where it is discussed. " +
"Limit connections to a MAXIMUM of 15 most relevant links. " +
"Return ONLY minified JSON. Schema: { \"graph\": { \"nodes\": [ { \"id\": \"string\", \"label\": \"string\", \"group\": \"concept|current\" } ], \"links\": [ { \"source\": \"string\", \"target\": \"string\", \"value\": 1 } ] } }";
"You are a strict Minimalist Information Architect. Your sole job is to build a high-level, sparse linear backbone for a textbook chapter. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'label', 'summary', and 'key_terms' fields MUST be in the EXACT SAME LANGUAGE as the source text. " +
"The input text consists of sections starting with block IDs (e.g., '[ID: seg-4]'). " +
"CRITICAL TOPOLOGY RULES (ZERO TOLERANCE FOR CLUTTER): " +
"1. HARD NODE LIMIT: You are strictly forbidden from extracting more than 4 to 5 nodes IN TOTAL for the entire text. If there are more sections, select ONLY the 4-5 absolute most critical, high-level structural pillars. " +
"2. NO CONCEPT CLOUDS: Do NOT create nodes for individual technologies, files, terms, or phrases (e.g., 'Kestrel', 'appsettings.json', 'DI', 'Blazor Server' must NEVER be nodes). They must ONLY exist as text strings inside the 'key_terms' array of a major node. " +
"3. LINEAR SPINE PATTERN: Nodes must form a clear, clean path or simple tree representing the chronological reading journey (e.g., Node 1 -> Node 2 -> Node 3). Do NOT create complex web loops or interconnect every node. Limit total links in the entire JSON to maximum 4 or 5 links. " +
"4. NODE DATA STRUCTURE: " +
" - 'id': must be the exact block ID (e.g., 'seg-16'). " +
" - 'label': clear technical title (Max 3 words, e.g., 'Blazor Hosting Models'). " +
" - 'group': strictly either 'bridge' (if it compares legacy vs modern) or 'concept' (for standalone core pillars). " +
" - 'summary': exact 2-sentence distillation for the Contextual Panel. " +
" - 'key_terms': array of max 5 short strings representing the micro-concepts hidden inside this section. " +
"System keys configuration: All JSON keys ('nodes', 'links', 'id', 'label', 'group', 'summary', 'key_terms', 'source', 'target', 'type') must remain strictly in English. " +
"Return ONLY minified JSON. Schema: " +
"{ " +
" \"graph\": { " +
" \"nodes\": [ " +
" { \"id\": \"seg-X\", \"label\": \"string\", \"group\": \"concept|bridge\", \"summary\": \"string\", \"key_terms\": [ \"string\" ] } " +
" ], " +
" \"links\": [ " +
" { \"source\": \"seg-X\", \"target\": \"seg-Y\", \"type\": \"maps_to|contains\" } " +
" ] " +
" } " +
"}";
public const string SummaryAndQuizPrompt =
"You are an expert educator. Provide a concise summary of the text and generate a challenging quiz (3-5 questions). " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The generated 'summary', 'question', and 'options' MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Return ONLY minified JSON. Schema: { \"summary\": \"string\", \"quizzes\": [ { \"question\": \"string\", \"options\": [ \"string\" ], \"correct_index\": 0 } ] }";
public const string KM_ExtractionPrompt =
"You are an expert at Knowledge Engineering. Segment the provided text into discrete Knowledge Units. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'content' field MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Identify 'units' (sections, tables, definitions, rules) and 'links' (how they relate). " +
"CRITICAL: Units must be granular. " +
"CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit. Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
"CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit (translate the name to the detected language). Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
"CRITICAL SYSTEM VALUES: The fields 'type' (strictly: 'Section', 'Table', 'Definition', or 'Rule') and 'relation' (strictly: 'Next', 'Defines', 'Contains', or 'References') are system keys and MUST remain in English as specified. " +
"Schema: { " +
"\"units\": [ { \"id\": \"string\", \"type\": \"Section|Table|Definition|Rule\", \"content\": \"string\", \"metadata\": { \"page\": 0 } } ], " +
"\"links\": [ { \"source\": \"string\", \"target\": \"string\", \"relation\": \"Next|Defines|Contains|References\" } ] " +
"}.";
public const string GroundedRAGSystemPrompt = """
You are an advanced, extremely precise Fact-Checking AI assistant. Your task is to answer the user's question using ONLY the provided context blocks.
Strict Grounding Rules:
1. Rely EXCLUSIVELY on the provided context. Do NOT use any pre-existing external knowledge, facts, or assumptions.
2. If the context does not contain the answer, you must set the "answer" property in the JSON object exactly to: 'I cannot answer this based on the provided book context.' and the "citations" array must be empty.
3. For every statement or claim you make in your answer, you must cite the specific source IDs (e.g., source chunk ID or hash) from the context.
4. You must format your response ONLY as a JSON object matching the following structure:
{
"answer": "The answer text goes here, referencing [Source ID] as citations.",
"citations": [
{
"citationId": "The exact source ID cited (e.g., chunk hash/ID)",
"snippet": "The precise sentence or phrase from the context that supports this statement.",
"sourceBook": "The book title or 'Unknown'"
}
]
}
""";
}