feat(search/rag): implement NexusSearchBox, dynamic Qdrant collection auto-provisioning, batch vector ingestion, mobile Serilog logging, and resolve 401 auth handler error (#51)

Resolves #52 This Pull Request introduces the **NexusSearchBox** search feature with premium unified styling, implements a robust **dynamic Qdrant collection auto-provisioning and batch-vector ingestion pipeline**, integrates a unified **Serilog logging infrastructure** for the Blazor Hybrid environment (MAUI), and resolves the **401 Unauthorized API header propagation error** inside mobile builds. ### 🚀 Key Implementations #### 1. Premium `NexusSearchBox` & Semantic Search UI * **NexusSearchBox Component:** Created an elegant search-as-you-type search box with smooth key navigation, quick-clearing, and seamless dynamic styling. * **Unified Aesthetics:** Refactored the search box isolated styling to align perfectly with the dashboard's design system using glassmorphism, `--nexus-neon` token gradients, and smooth pulse/fade animations. * **Semantic Search Integration:** Integrated semantic search query dispatching (`SearchLibrarySemanticallyQuery`) and wired up navigation seamlessly through the updated `ReaderNavigationService`. * **Tests Hardening:** Added/adapted query assertions in `QueryTests.cs` to guarantee safe parameterization and error boundary mapping. #### 2. Qdrant Collection Provisioning & Vector Ingestion * **Dynamic Auto-Provisioning:** Implemented dynamic checking and lazy-creation of the `knowledge_units` collection using 768 dimensions and Cosine distance. * **High-Performance Ingestion:** Optimized `ProcessKnowledgeUnitsAsync` with high-performance batch embedding generation using `_embeddingGenerator` and deterministic MD5 GUIDs for stable, duplicate-free upsertion. * **Database Cache Clear Sync:** Integrated Qdrant collection deletion in `ClearCacheAsync` to ensure absolute consistency between the PostgreSQL database cache and vector database indices. #### 3. Cross-Platform MAUI Logging (Serilog Infrastructure) * **Serilog Integration:** Configured cross-platform Serilog routing in `SerilogConfiguration.cs`, streaming diagnostic logs safely across native platforms and the Blazor Webview container. * **Interop Bridge:** Built `BlazorLoggingBridge.cs` to capture web console messages and pipe them directly to the native host logger. * **Demo Interface:** Added an interactive `SerilogDemo.razor` sandbox under Pages. #### 4. Resolving 401 Load Errors (Authentication Handler Flow) * **Authentication Header Handler:** Implemented the `MobileAuthenticationHeaderHandler` to correctly extract, validate, and inject bearer JWT tokens into outbound API requests. * **Configuration-based API Host:** Structured standard API URI routing to use clean configuration bindings in `appsettings.json`. --- ### 🧪 Verification & Build Status * Run `dotnet build` from the solution root: Successfully compiled the full multi-targeted solution (`Liczba błędów: 0`). * All unit and integration tests successfully executed and verified (`dotnet test`). --------- Co-authored-by: Marek Jasiński <jasins.marek@gmail.com> Co-authored-by: Marek Jaisński <jasins.marek@gmail.com> Reviewed-on: #51 Co-authored-by: Antigravity <antigravity@google.com> Co-committed-by: Antigravity <antigravity@google.com>
2026-05-26 12:15:28 +00:00
parent 758b152a0c
commit a0bf6c15f4
61 changed files with 5111 additions and 570 deletions
@@ -4,9 +4,10 @@ public static class PromptRegistry
 {
    public const string KnowledgeExtractionSystemPrompt = 
        "You are an expert educator. Analyze the provided text to extract key concepts, generate relevant quizzes, and construct a knowledge graph. " +
-        "CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' instead of full sentences). " +
+        "**LANGUAGE CRITICAL**: Detect the language of the provided text. You MUST generate all human-readable fields ('title', 'description', 'question', 'options', 'label') in the EXACT SAME LANGUAGE as the source text. Do NOT translate them to English unless the source text is in English. " +
+        "CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' or its exact foreign equivalent, never full sentences). " +
        "CRITICAL: Extract a MAXIMUM of 15 key concepts/plot points from the text. " +
-        "CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node (e.g., 'Code Example'). Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
+        "CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node with the label 'Code Example' translated to the detected language. Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
        "CRITICAL: Return ONLY a minified JSON object. Do NOT include markdown formatting like ```json or ```. Do NOT include explanations. " +
        "Schema: { " +
        "\"concepts\": [ { \"title\": \"string\", \"description\": \"string\" } ], " +
@@ -15,28 +16,66 @@ public static class PromptRegistry
        "}.";

    public const string GraphExtractionPrompt = 
-        "You are an expert at information architecture. Extract key concepts and paragraph mappings from the text to build a unified knowledge graph. " +
-        "The input text consists of several paragraphs, each starting with its unique block ID in the format '[ID: seg-X]'. " +
-        "Extract two types of nodes: " +
-        "1. Concept Nodes (group: 'concept'): Extract the main technical concepts discussed (e.g., ID: 'dependency-injection', label: 'Dependency Injection'). Max 10 concepts. Labels must be at most 3 words. " +
-        "2. Block Nodes (group: 'current'): For each paragraph in the input, create a node representing that paragraph where 'id' is the exact block ID (e.g., 'seg-1'), and 'label' is a brief summary of that paragraph's content (max 3 words). " +
-        "CRITICAL: If a paragraph is a code block, represent it as a single block node with label 'Code Example' (group: 'current'). Do NOT extract low-level code elements (like variables, classes, methods, or namespaces) as separate concept nodes. " +
-        "CRITICAL: Connect related concept nodes together, and connect each concept node to the block nodes ('seg-X') where it is discussed. " +
-        "Limit connections to a MAXIMUM of 15 most relevant links. " +
-        "Return ONLY minified JSON. Schema: { \"graph\": { \"nodes\": [ { \"id\": \"string\", \"label\": \"string\", \"group\": \"concept|current\" } ], \"links\": [ { \"source\": \"string\", \"target\": \"string\", \"value\": 1 } ] } }";
-
+        "You are a strict Minimalist Information Architect. Your sole job is to build a high-level, sparse linear backbone for a textbook chapter. " +
+        "**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'label', 'summary', and 'key_terms' fields MUST be in the EXACT SAME LANGUAGE as the source text. " +
+        "The input text consists of sections starting with block IDs (e.g., '[ID: seg-4]'). " +
+        "CRITICAL TOPOLOGY RULES (ZERO TOLERANCE FOR CLUTTER): " +
+        "1. HARD NODE LIMIT: You are strictly forbidden from extracting more than 4 to 5 nodes IN TOTAL for the entire text. If there are more sections, select ONLY the 4-5 absolute most critical, high-level structural pillars. " +
+        "2. NO CONCEPT CLOUDS: Do NOT create nodes for individual technologies, files, terms, or phrases (e.g., 'Kestrel', 'appsettings.json', 'DI', 'Blazor Server' must NEVER be nodes). They must ONLY exist as text strings inside the 'key_terms' array of a major node. " +
+        "3. LINEAR SPINE PATTERN: Nodes must form a clear, clean path or simple tree representing the chronological reading journey (e.g., Node 1 -> Node 2 -> Node 3). Do NOT create complex web loops or interconnect every node. Limit total links in the entire JSON to maximum 4 or 5 links. " +
+        "4. NODE DATA STRUCTURE: " +
+        "   - 'id': must be the exact block ID (e.g., 'seg-16'). " +
+        "   - 'label': clear technical title (Max 3 words, e.g., 'Blazor Hosting Models'). " +
+        "   - 'group': strictly either 'bridge' (if it compares legacy vs modern) or 'concept' (for standalone core pillars). " +
+        "   - 'summary': exact 2-sentence distillation for the Contextual Panel. " +
+        "   - 'key_terms': array of max 5 short strings representing the micro-concepts hidden inside this section. " +
+        "System keys configuration: All JSON keys ('nodes', 'links', 'id', 'label', 'group', 'summary', 'key_terms', 'source', 'target', 'type') must remain strictly in English. " +
+        "Return ONLY minified JSON. Schema: " +
+        "{ " +
+        "  \"graph\": { " +
+        "    \"nodes\": [ " +
+        "      { \"id\": \"seg-X\", \"label\": \"string\", \"group\": \"concept|bridge\", \"summary\": \"string\", \"key_terms\": [ \"string\" ] } " +
+        "    ], " +
+        "    \"links\": [ " +
+        "      { \"source\": \"seg-X\", \"target\": \"seg-Y\", \"type\": \"maps_to|contains\" } " +
+        "    ] " +
+        "  } " +
+        "}";

    public const string SummaryAndQuizPrompt = 
        "You are an expert educator. Provide a concise summary of the text and generate a challenging quiz (3-5 questions). " +
+        "**LANGUAGE CRITICAL**: Detect the language of the provided text. The generated 'summary', 'question', and 'options' MUST be in the EXACT SAME LANGUAGE as the source text. " +
        "Return ONLY minified JSON. Schema: { \"summary\": \"string\", \"quizzes\": [ { \"question\": \"string\", \"options\": [ \"string\" ], \"correct_index\": 0 } ] }";

    public const string KM_ExtractionPrompt = 
        "You are an expert at Knowledge Engineering. Segment the provided text into discrete Knowledge Units. " +
+        "**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'content' field MUST be in the EXACT SAME LANGUAGE as the source text. " +
        "Identify 'units' (sections, tables, definitions, rules) and 'links' (how they relate). " +
        "CRITICAL: Units must be granular. " +
-        "CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit. Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
+        "CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit (translate the name to the detected language). Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
+        "CRITICAL SYSTEM VALUES: The fields 'type' (strictly: 'Section', 'Table', 'Definition', or 'Rule') and 'relation' (strictly: 'Next', 'Defines', 'Contains', or 'References') are system keys and MUST remain in English as specified. " +
        "Schema: { " +
        "\"units\": [ { \"id\": \"string\", \"type\": \"Section|Table|Definition|Rule\", \"content\": \"string\", \"metadata\": { \"page\": 0 } } ], " +
        "\"links\": [ { \"source\": \"string\", \"target\": \"string\", \"relation\": \"Next|Defines|Contains|References\" } ] " +
        "}.";
+
+    public const string GroundedRAGSystemPrompt = """
+        You are an advanced, extremely precise Fact-Checking AI assistant. Your task is to answer the user's question using ONLY the provided context blocks.
+
+        Strict Grounding Rules:
+        1. Rely EXCLUSIVELY on the provided context. Do NOT use any pre-existing external knowledge, facts, or assumptions.
+        2. If the context does not contain the answer, you must set the "answer" property in the JSON object exactly to: 'I cannot answer this based on the provided book context.' and the "citations" array must be empty.
+        3. For every statement or claim you make in your answer, you must cite the specific source IDs (e.g., source chunk ID or hash) from the context.
+        4. You must format your response ONLY as a JSON object matching the following structure:
+        {
+          "answer": "The answer text goes here, referencing [Source ID] as citations.",
+          "citations": [
+            {
+              "citationId": "The exact source ID cited (e.g., chunk hash/ID)",
+              "snippet": "The precise sentence or phrase from the context that supports this statement.",
+              "sourceBook": "The book title or 'Unknown'"
+            }
+          ]
+        }
+        """;
 }