Bug: Images within EPUB books not displaying in the reader canvas #64

Closed
opened 2026-06-01 11:35:01 +00:00 by Antigravity · 0 comments
Collaborator

Bug Description

Images inside EPUB books are currently not rendering within the book reader page.

Technical Analysis

  1. Sanitization & Extraction:

    • EpubReaderService.ExtractParagraphs uses a regex that only captures specific tags (p, h[1-6], ul, ol, blockquote, pre, hr). It does not capture or include root-level or nested <img> tags if they are structured outside or inside these containers without proper nesting.
    • EpubReaderService.SanitizeParagraph has a strict whitelist regex: clean = Regex.Replace(clean, @"<(?!/?(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b)[^>]+>", "", RegexOptions.IgnoreCase);. This explicitly strips out any tag not in the whitelist, which deletes <img> tags entirely.
    • SanitizeParagraph then runs clean = Regex.Replace(clean, @"<(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b[^>]*>", "<$1>", RegexOptions.IgnoreCase); which removes all attributes (like src, alt, etc.).
  2. Image Serving Endpoint:

    • Even if <img> tags with src attribute are preserved, the browser cannot resolve relative EPUB zip paths (e.g. ../images/pic1.png). We need a server endpoint (e.g. /api/epub/{ebookId}/resource?path={path}) that can read the requested resource file dynamically from the EPUB archive.
  3. URL Rewriting:

    • In EpubReaderService, when parsing the HTML content of a chapter, we must rewrite the src attribute of <img> tags from their relative paths inside the EPUB to our web-accessible resource endpoint.

Proposed Solution

  1. Add GetEpubResourceAsync to IEpubReader and EpubReaderService to retrieve binary resource files (images) from an EPUB.
  2. Register a new route /api/epub/{ebookId:guid}/resource in Program.cs that returns the image bytes with the correct MIME type.
  3. Update EpubReaderService.ExtractParagraphs to match <img> elements if they appear at root-level (or ensure the regex is flexible enough).
  4. Update EpubReaderService.SanitizeParagraph to preserve <img> tags along with their src attributes, and rewrite them to reference /api/epub/{ebookId}/resource?path={resolvedPath}.
  5. Verify changes by building the project.
### Bug Description Images inside EPUB books are currently not rendering within the book reader page. ### Technical Analysis 1. **Sanitization & Extraction**: - `EpubReaderService.ExtractParagraphs` uses a regex that only captures specific tags (`p`, `h[1-6]`, `ul`, `ol`, `blockquote`, `pre`, `hr`). It does not capture or include root-level or nested `<img>` tags if they are structured outside or inside these containers without proper nesting. - `EpubReaderService.SanitizeParagraph` has a strict whitelist regex: `clean = Regex.Replace(clean, @"<(?!/?(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b)[^>]+>", "", RegexOptions.IgnoreCase);`. This explicitly strips out any tag not in the whitelist, which deletes `<img>` tags entirely. - `SanitizeParagraph` then runs `clean = Regex.Replace(clean, @"<(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b[^>]*>", "<$1>", RegexOptions.IgnoreCase);` which removes all attributes (like `src`, `alt`, etc.). 2. **Image Serving Endpoint**: - Even if `<img>` tags with `src` attribute are preserved, the browser cannot resolve relative EPUB zip paths (e.g. `../images/pic1.png`). We need a server endpoint (e.g. `/api/epub/{ebookId}/resource?path={path}`) that can read the requested resource file dynamically from the EPUB archive. 3. **URL Rewriting**: - In `EpubReaderService`, when parsing the HTML content of a chapter, we must rewrite the `src` attribute of `<img>` tags from their relative paths inside the EPUB to our web-accessible resource endpoint. ### Proposed Solution 1. Add `GetEpubResourceAsync` to `IEpubReader` and `EpubReaderService` to retrieve binary resource files (images) from an EPUB. 2. Register a new route `/api/epub/{ebookId:guid}/resource` in `Program.cs` that returns the image bytes with the correct MIME type. 3. Update `EpubReaderService.ExtractParagraphs` to match `<img>` elements if they appear at root-level (or ensure the regex is flexible enough). 4. Update `EpubReaderService.SanitizeParagraph` to preserve `<img>` tags along with their `src` attributes, and rewrite them to reference `/api/epub/{ebookId}/resource?path={resolvedPath}`. 5. Verify changes by building the project.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mjasin/Nexus.Reader#64