Files
Nexus.Reader/src/NexusReader.Infrastructure/Services/PromptRegistry.cs
T
Antigravity a0bf6c15f4 feat(search/rag): implement NexusSearchBox, dynamic Qdrant collection auto-provisioning, batch vector ingestion, mobile Serilog logging, and resolve 401 auth handler error (#51)
Resolves #52

This Pull Request introduces the **NexusSearchBox** search feature with premium unified styling, implements a robust **dynamic Qdrant collection auto-provisioning and batch-vector ingestion pipeline**, integrates a unified **Serilog logging infrastructure** for the Blazor Hybrid environment (MAUI), and resolves the **401 Unauthorized API header propagation error** inside mobile builds.

### 🚀 Key Implementations

#### 1. Premium `NexusSearchBox` & Semantic Search UI
* **NexusSearchBox Component:** Created an elegant search-as-you-type search box with smooth key navigation, quick-clearing, and seamless dynamic styling.
* **Unified Aesthetics:** Refactored the search box isolated styling to align perfectly with the dashboard's design system using glassmorphism, `--nexus-neon` token gradients, and smooth pulse/fade animations.
* **Semantic Search Integration:** Integrated semantic search query dispatching (`SearchLibrarySemanticallyQuery`) and wired up navigation seamlessly through the updated `ReaderNavigationService`.
* **Tests Hardening:** Added/adapted query assertions in `QueryTests.cs` to guarantee safe parameterization and error boundary mapping.

#### 2. Qdrant Collection Provisioning & Vector Ingestion
* **Dynamic Auto-Provisioning:** Implemented dynamic checking and lazy-creation of the `knowledge_units` collection using 768 dimensions and Cosine distance.
* **High-Performance Ingestion:** Optimized `ProcessKnowledgeUnitsAsync` with high-performance batch embedding generation using `_embeddingGenerator` and deterministic MD5 GUIDs for stable, duplicate-free upsertion.
* **Database Cache Clear Sync:** Integrated Qdrant collection deletion in `ClearCacheAsync` to ensure absolute consistency between the PostgreSQL database cache and vector database indices.

#### 3. Cross-Platform MAUI Logging (Serilog Infrastructure)
* **Serilog Integration:** Configured cross-platform Serilog routing in `SerilogConfiguration.cs`, streaming diagnostic logs safely across native platforms and the Blazor Webview container.
* **Interop Bridge:** Built `BlazorLoggingBridge.cs` to capture web console messages and pipe them directly to the native host logger.
* **Demo Interface:** Added an interactive `SerilogDemo.razor` sandbox under Pages.

#### 4. Resolving 401 Load Errors (Authentication Handler Flow)
* **Authentication Header Handler:** Implemented the `MobileAuthenticationHeaderHandler` to correctly extract, validate, and inject bearer JWT tokens into outbound API requests.
* **Configuration-based API Host:** Structured standard API URI routing to use clean configuration bindings in `appsettings.json`.

---

### 🧪 Verification & Build Status
* Run `dotnet build` from the solution root: Successfully compiled the full multi-targeted solution (`Liczba błędów: 0`).
* All unit and integration tests successfully executed and verified (`dotnet test`).

---------

Co-authored-by: Marek Jasiński <jasins.marek@gmail.com>
Co-authored-by: Marek Jaisński <jasins.marek@gmail.com>
Reviewed-on: #51
Co-authored-by: Antigravity <antigravity@google.com>
Co-committed-by: Antigravity <antigravity@google.com>
2026-05-26 12:15:28 +00:00

82 lines
7.2 KiB
C#

namespace NexusReader.Infrastructure.Services;
public static class PromptRegistry
{
public const string KnowledgeExtractionSystemPrompt =
"You are an expert educator. Analyze the provided text to extract key concepts, generate relevant quizzes, and construct a knowledge graph. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. You MUST generate all human-readable fields ('title', 'description', 'question', 'options', 'label') in the EXACT SAME LANGUAGE as the source text. Do NOT translate them to English unless the source text is in English. " +
"CRITICAL: Restrict 'concept.label' to a maximum of 3 words (e.g., 'Dependency Injection' or its exact foreign equivalent, never full sentences). " +
"CRITICAL: Extract a MAXIMUM of 15 key concepts/plot points from the text. " +
"CRITICAL: Code blocks (e.g., markdown code snippets) must be excluded from the relationship graph, or summarized as a single node with the label 'Code Example' translated to the detected language. Do NOT create nodes for variables, functions, namespaces, or individual lines of code. " +
"CRITICAL: Return ONLY a minified JSON object. Do NOT include markdown formatting like ```json or ```. Do NOT include explanations. " +
"Schema: { " +
"\"concepts\": [ { \"title\": \"string\", \"description\": \"string\" } ], " +
"\"quizzes\": [ { \"question\": \"string\", \"options\": [ \"string\" ], \"correct_index\": 0 } ], " +
"\"graph\": { \"nodes\": [ { \"id\": \"string\", \"label\": \"string\", \"group\": \"concept\" } ], \"links\": [ { \"source\": \"string\", \"target\": \"string\", \"value\": 1 } ] } " +
"}.";
public const string GraphExtractionPrompt =
"You are a strict Minimalist Information Architect. Your sole job is to build a high-level, sparse linear backbone for a textbook chapter. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'label', 'summary', and 'key_terms' fields MUST be in the EXACT SAME LANGUAGE as the source text. " +
"The input text consists of sections starting with block IDs (e.g., '[ID: seg-4]'). " +
"CRITICAL TOPOLOGY RULES (ZERO TOLERANCE FOR CLUTTER): " +
"1. HARD NODE LIMIT: You are strictly forbidden from extracting more than 4 to 5 nodes IN TOTAL for the entire text. If there are more sections, select ONLY the 4-5 absolute most critical, high-level structural pillars. " +
"2. NO CONCEPT CLOUDS: Do NOT create nodes for individual technologies, files, terms, or phrases (e.g., 'Kestrel', 'appsettings.json', 'DI', 'Blazor Server' must NEVER be nodes). They must ONLY exist as text strings inside the 'key_terms' array of a major node. " +
"3. LINEAR SPINE PATTERN: Nodes must form a clear, clean path or simple tree representing the chronological reading journey (e.g., Node 1 -> Node 2 -> Node 3). Do NOT create complex web loops or interconnect every node. Limit total links in the entire JSON to maximum 4 or 5 links. " +
"4. NODE DATA STRUCTURE: " +
" - 'id': must be the exact block ID (e.g., 'seg-16'). " +
" - 'label': clear technical title (Max 3 words, e.g., 'Blazor Hosting Models'). " +
" - 'group': strictly either 'bridge' (if it compares legacy vs modern) or 'concept' (for standalone core pillars). " +
" - 'summary': exact 2-sentence distillation for the Contextual Panel. " +
" - 'key_terms': array of max 5 short strings representing the micro-concepts hidden inside this section. " +
"System keys configuration: All JSON keys ('nodes', 'links', 'id', 'label', 'group', 'summary', 'key_terms', 'source', 'target', 'type') must remain strictly in English. " +
"Return ONLY minified JSON. Schema: " +
"{ " +
" \"graph\": { " +
" \"nodes\": [ " +
" { \"id\": \"seg-X\", \"label\": \"string\", \"group\": \"concept|bridge\", \"summary\": \"string\", \"key_terms\": [ \"string\" ] } " +
" ], " +
" \"links\": [ " +
" { \"source\": \"seg-X\", \"target\": \"seg-Y\", \"type\": \"maps_to|contains\" } " +
" ] " +
" } " +
"}";
public const string SummaryAndQuizPrompt =
"You are an expert educator. Provide a concise summary of the text and generate a challenging quiz (3-5 questions). " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The generated 'summary', 'question', and 'options' MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Return ONLY minified JSON. Schema: { \"summary\": \"string\", \"quizzes\": [ { \"question\": \"string\", \"options\": [ \"string\" ], \"correct_index\": 0 } ] }";
public const string KM_ExtractionPrompt =
"You are an expert at Knowledge Engineering. Segment the provided text into discrete Knowledge Units. " +
"**LANGUAGE CRITICAL**: Detect the language of the provided text. The 'content' field MUST be in the EXACT SAME LANGUAGE as the source text. " +
"Identify 'units' (sections, tables, definitions, rules) and 'links' (how they relate). " +
"CRITICAL: Units must be granular. " +
"CRITICAL: Code blocks must be summarized under the parent unit or represented as a single 'Code Example' unit (translate the name to the detected language). Do NOT segment code blocks into granular low-level code details (e.g., classes, variables, parameters). " +
"CRITICAL SYSTEM VALUES: The fields 'type' (strictly: 'Section', 'Table', 'Definition', or 'Rule') and 'relation' (strictly: 'Next', 'Defines', 'Contains', or 'References') are system keys and MUST remain in English as specified. " +
"Schema: { " +
"\"units\": [ { \"id\": \"string\", \"type\": \"Section|Table|Definition|Rule\", \"content\": \"string\", \"metadata\": { \"page\": 0 } } ], " +
"\"links\": [ { \"source\": \"string\", \"target\": \"string\", \"relation\": \"Next|Defines|Contains|References\" } ] " +
"}.";
public const string GroundedRAGSystemPrompt = """
You are an advanced, extremely precise Fact-Checking AI assistant. Your task is to answer the user's question using ONLY the provided context blocks.
Strict Grounding Rules:
1. Rely EXCLUSIVELY on the provided context. Do NOT use any pre-existing external knowledge, facts, or assumptions.
2. If the context does not contain the answer, you must set the "answer" property in the JSON object exactly to: 'I cannot answer this based on the provided book context.' and the "citations" array must be empty.
3. For every statement or claim you make in your answer, you must cite the specific source IDs (e.g., source chunk ID or hash) from the context.
4. You must format your response ONLY as a JSON object matching the following structure:
{
"answer": "The answer text goes here, referencing [Source ID] as citations.",
"citations": [
{
"citationId": "The exact source ID cited (e.g., chunk hash/ID)",
"snippet": "The precise sentence or phrase from the context that supports this statement.",
"sourceBook": "The book title or 'Unknown'"
}
]
}
""";
}