Newsreader: auto-fetch full article text for truncated RSS entries

t-643·WorkTask·
·
·
·Omni/Newsreader/Ingest.hs
Created1 month ago·Updated1 month ago·pipeline runs →

Description

Edit

Many RSS feeds only syndicate a snippet and require visiting the website for the full article. The newsreader should detect truncated feed entries (short <description> + a <link>) and auto-fetch the full article content using a Readability-style extractor (e.g. Mozilla Readability.js or FiveFilters Full-Text RSS approach). This makes the reader self-contained and avoids the ad-impression dark pattern. Details: 1) Detect truncation heuristic (description length < threshold, or presence of 'read more' patterns). 2) Fetch the linked page. 3) Run through Readability/content extractor to strip nav/ads. 4) Store full text alongside the feed entry. 5) Rate-limit fetches; respect robots.txt. 6) Fallback gracefully to snippet if fetch fails.

Git Commits

5646f168Newsreader: fetch full text for truncated feed entries
Coder Agent8 weeks ago2 files

Timeline (9)

🔄[system]Open → InProgress1 month ago
💬[system]1 month ago

Pipeline: dev completed (run=dev-t-643-1771527132, cost=0.0c)

🔄[system]InProgress → Verified1 month ago
💬[system]1 month ago

Pipeline: build verification passed

🔄[system]Verified → Done1 month ago
💬[system]1 month ago

Pipeline: integrated into live at 5646f1682fa061b9c64479b8799c803a12ea545c