Bug: Images within EPUB books not displaying in the reader canvas #64

New Issue

2026-06-01T11:35:01Z

Antigravity commented

2026-06-01 11:35:01 +00:00

Bug Description

Images inside EPUB books are currently not rendering within the book reader page.

Technical Analysis

Sanitization & Extraction:
- EpubReaderService.ExtractParagraphs uses a regex that only captures specific tags (p, h[1-6], ul, ol, blockquote, pre, hr). It does not capture or include root-level or nested <img> tags if they are structured outside or inside these containers without proper nesting.
- EpubReaderService.SanitizeParagraph has a strict whitelist regex: clean = Regex.Replace(clean, @"<(?!/?(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b)[^>]+>", "", RegexOptions.IgnoreCase);. This explicitly strips out any tag not in the whitelist, which deletes <img> tags entirely.
- SanitizeParagraph then runs clean = Regex.Replace(clean, @"<(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b[^>]*>", "<$1>", RegexOptions.IgnoreCase); which removes all attributes (like src, alt, etc.).
Image Serving Endpoint:
- Even if <img> tags with src attribute are preserved, the browser cannot resolve relative EPUB zip paths (e.g. ../images/pic1.png). We need a server endpoint (e.g. /api/epub/{ebookId}/resource?path={path}) that can read the requested resource file dynamically from the EPUB archive.
URL Rewriting:
- In EpubReaderService, when parsing the HTML content of a chapter, we must rewrite the src attribute of <img> tags from their relative paths inside the EPUB to our web-accessible resource endpoint.

Proposed Solution

Add GetEpubResourceAsync to IEpubReader and EpubReaderService to retrieve binary resource files (images) from an EPUB.
Register a new route /api/epub/{ebookId:guid}/resource in Program.cs that returns the image bytes with the correct MIME type.
Update EpubReaderService.ExtractParagraphs to match <img> elements if they appear at root-level (or ensure the regex is flexible enough).
Update EpubReaderService.SanitizeParagraph to preserve <img> tags along with their src attributes, and rewrite them to reference /api/epub/{ebookId}/resource?path={resolvedPath}.
Verify changes by building the project.

### Bug Description Images inside EPUB books are currently not rendering within the book reader page. ### Technical Analysis 1. **Sanitization & Extraction**: - `EpubReaderService.ExtractParagraphs` uses a regex that only captures specific tags (`p`, `h[1-6]`, `ul`, `ol`, `blockquote`, `pre`, `hr`). It does not capture or include root-level or nested `<img>` tags if they are structured outside or inside these containers without proper nesting. - `EpubReaderService.SanitizeParagraph` has a strict whitelist regex: `clean = Regex.Replace(clean, @"<(?!/?(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b)[^>]+>", "", RegexOptions.IgnoreCase);`. This explicitly strips out any tag not in the whitelist, which deletes `<img>` tags entirely. - `SanitizeParagraph` then runs `clean = Regex.Replace(clean, @"<(b|i|strong|em|h[1-6]|p|ul|ol|li|blockquote|pre|code|br|hr)\b[^>]*>", "<$1>", RegexOptions.IgnoreCase);` which removes all attributes (like `src`, `alt`, etc.). 2. **Image Serving Endpoint**: - Even if `<img>` tags with `src` attribute are preserved, the browser cannot resolve relative EPUB zip paths (e.g. `../images/pic1.png`). We need a server endpoint (e.g. `/api/epub/{ebookId}/resource?path={path}`) that can read the requested resource file dynamically from the EPUB archive. 3. **URL Rewriting**: - In `EpubReaderService`, when parsing the HTML content of a chapter, we must rewrite the `src` attribute of `<img>` tags from their relative paths inside the EPUB to our web-accessible resource endpoint. ### Proposed Solution 1. Add `GetEpubResourceAsync` to `IEpubReader` and `EpubReaderService` to retrieve binary resource files (images) from an EPUB. 2. Register a new route `/api/epub/{ebookId:guid}/resource` in `Program.cs` that returns the image bytes with the correct MIME type. 3. Update `EpubReaderService.ExtractParagraphs` to match `<img>` elements if they appear at root-level (or ensure the regex is flexible enough). 4. Update `EpubReaderService.SanitizeParagraph` to preserve `<img>` tags along with their `src` attributes, and rewrite them to reference `/api/epub/{ebookId}/resource?path={resolvedPath}`. 5. Verify changes by building the project.

mjasin referenced this issue from a commit

2026-06-01 13:09:35 +00:00

fix: preserve and render EPUB images via dynamic server endpoint (fixes #64)

Antigravity referenced a pull request that will close this issue

2026-06-01 13:09:38 +00:00

fix: preserve and render EPUB images via dynamic server endpoint #65

mjasin closed this issue

2026-06-01 16:04:57 +00:00

mjasin referenced this issue from a commit

2026-06-01 16:04:59 +00:00

fix: preserve and render EPUB images via dynamic server endpoint (#65)

Sign in to join this conversation.