[Migration] Polyglot Ingestion Pipeline (Neo4j, Qdrant, Hangfire) #47

Closed
opened 2026-05-20 17:56:02 +00:00 by Antigravity · 0 comments
Collaborator

This issue tracks the successful migration of the ebook ingestion pipeline from standard PostgreSQL pgvector database-level storage to a highly scalable, polyglot persistence architecture.

Objective

  • Decouple vector and relational database storage to optimize read/write scaling.
  • Enable high-dimensional search using Qdrant.
  • Persist rich structural relationship connections between chapters, concepts, and definitions using Neo4j.
  • Transition the ingestion process to a robust background architecture using Hangfire servers.

Work Done

  1. Infrastructure Integration: Introduced Qdrant.Client and Neo4j.Driver inside the application layers.
  2. Concurrent Background Job: Implemented a robust Hangfire EbookIngestionJob utilizing Polly exponential retries for rate-limiting 429 exceptions, executing three ingestion pathways concurrently using Task.WhenAll.
  3. Wasm & Test Alignment: Implemented semantic search inside WasmKnowledgeService and aligned the Application-level unit tests.
  4. Backward Compatibility: Preserved migration history compile paths by retaining Pgvector.EntityFrameworkCore package reference in the Data project.

Verification Status

  • Clean compilation across all 10 projects.
  • All unit tests passing successfully.
This issue tracks the successful migration of the ebook ingestion pipeline from standard PostgreSQL pgvector database-level storage to a highly scalable, polyglot persistence architecture. ### Objective - Decouple vector and relational database storage to optimize read/write scaling. - Enable high-dimensional search using **Qdrant**. - Persist rich structural relationship connections between chapters, concepts, and definitions using **Neo4j**. - Transition the ingestion process to a robust background architecture using **Hangfire** servers. ### Work Done 1. **Infrastructure Integration**: Introduced `Qdrant.Client` and `Neo4j.Driver` inside the application layers. 2. **Concurrent Background Job**: Implemented a robust Hangfire `EbookIngestionJob` utilizing Polly exponential retries for rate-limiting 429 exceptions, executing three ingestion pathways concurrently using `Task.WhenAll`. 3. **Wasm & Test Alignment**: Implemented semantic search inside `WasmKnowledgeService` and aligned the Application-level unit tests. 4. **Backward Compatibility**: Preserved migration history compile paths by retaining `Pgvector.EntityFrameworkCore` package reference in the Data project. ### Verification Status - Clean compilation across all 10 projects. - All unit tests passing successfully.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mjasin/Nexus.Reader#47