{"slug": "show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable", "title": "Show HN: Papernews – self-hosted daily newspaper PDF for your reMarkable", "summary": "A developer released Papernews, a self-hosted tool that aggregates RSS feeds and Hacker News content into a single daily PDF designed for e-ink readers like the reMarkable. The script uses Anthropic's Claude to rewrite full article text into a consistent LaTeX typeset format with quiet typography and no ads or visual noise. The project requires Docker and an Anthropic API key, generating offline-readable PDFs that users can push to their devices manually or via cron jobs.", "body_md": "Every news site looks different. Hacker News, MacRumors, Quanta, my favourite ML blog, my favourite math blog — each one its own layout, fonts, colors, ads. To read anything I had to wade through somebody's design choices first and focus past the visual noise.\n\nI much prefer reading the way a LaTeX paper or an old magazine looks: quiet typography, generous margins, no color, nothing competing for attention.\n\n**papernews** is the fix. A script pulls all those feeds, has Claude clean\nup, translate to English, and rewrite the article bodies — the **full\ntext**, not just summaries — and renders the result into one consistently\ntypeset LaTeX PDF. Every article is *in* the PDF; you read entirely\noffline, no clicking through, no opening tabs.\n\nA side benefit I didn't expect to like but very much do: one place to read the day's news instead of five tabs being refreshed all day. One or two issues per day, no more.\n\nDesigned for an e-ink reader like the reMarkable, but it works just as well in any browser's PDF viewer.\n\n**👉 See **\n\n`sample-2026-06-04.pdf`\n\nfor a real day's output.Hobby project; works. Things will move. Expect rough edges.\n\nYou need: a machine that can run Docker (your laptop, a NAS, a $5/mo VPS,\nanything), an [Anthropic API key](https://console.anthropic.com/settings/keys),\nand ~2 GB of disk for the image.\n\n```\n# 1) Pull\ngit clone https://github.com/marcj/papernews\ncd papernews\n\n# 2) Configure your key\ncp .env.example .env\n$EDITOR .env             # paste ANTHROPIC_API_KEY=sk-ant-...\n\n# 3) Pick your sources\n$EDITOR sources.toml     # add/remove RSS/HN entries, set per-source limits\n\n# 4) (Optional) Tweak the look\n$EDITOR papernews/template.tex.j2\n\n# 5) Build + run\ndocker compose up --build -d\n\n# Open http://localhost:8000\n# First PDF builds on demand and is cached. Background ingest runs every 4h.\n```\n\nEverything you'd normally want to change is in **two files**:\n\n— which feeds, how many items per feed, in what order. Two source kinds today:`sources.toml`\n\n`kind = \"hn\"`\n\n(Hacker News, top-by-points via the Algolia API) and`kind = \"rss\"`\n\n(any Atom/RSS feed via feedparser).— the LaTeX template. Page size, fonts, colors, layout, what goes on the cover, everything. Edit, restart the container, refresh`papernews/template.tex.j2`\n\n`/digest.pdf`\n\n.\n\nOptional but useful:\n\n+`papernews/summarize.py`\n\n— the Claude system prompts. Change`papernews/rewrite.py`\n\n`_MODEL`\n\nto`claude-sonnet-4-6`\n\nfor fancier rewrites at ~10× the cost; adjust`_SYSTEM`\n\nto change the editorial voice (e.g. disable the auto-translate-to-English rule).— what goes into the World news block and the Quote-of-the-day source.`papernews/wiki.py`\n\nA few different ways, no special script needed:\n\n**Manual**— open`http://your-machine:8000/digest.pdf`\n\nin a browser on your phone/laptop and upload it to your reMarkable from there (drag-and- drop on`my.remarkable.com`\n\n, or the reMarkable mobile app, or the USB Web Interface at`http://10.11.99.1`\n\nwhile connected by USB).— a third-party CLI that pushes files to your reMarkable cloud account. Pair once, then:`rmapi`\n\nStick that two-liner in cron on the host and the device picks it up on next sync automatically.\n\n```\ncurl -s http://your-machine:8000/digest.pdf -o today.pdf\nrmapi put today.pdf /Papernews\n```\n\n— a third-party email-to-reMarkable bridge ([Remailable](https://github.com/remailable/remailable)[remailable.getneutrality.org](https://remailable.getneutrality.org)). You email the PDF as an attachment to your assigned address and it appears on the device. Useful if your papernews host can`mail`\n\n/`mutt`\n\nbut can't reach the reMarkable directly. (reMarkable has no first-party email-to-device; do not believe earlier versions of this README that implied otherwise.)\n\nNo native push is built-in because everyone's setup is different and you probably don't want me poking your reMarkable cloud account with your token.\n\n```\ngit clone https://github.com/yourname/papernews\ncd papernews\ncp .env.example .env\n# paste your ANTHROPIC_API_KEY into .env (get one at\n# https://console.anthropic.com/settings/keys)\ndocker compose up --build\n```\n\nThen visit `http://localhost:8000`\n\n— landing page with a preview image and a\nlink to `/digest.pdf`\n\n. The first PDF builds on demand, takes ~1–2 minutes the\nfirst time and is then cached until new content arrives.\n\nState lives in `./data/state.db`\n\n(bind-mounted from the host) so it survives\ncontainer restarts.\n\nA 100–200 page PDF with:\n\n**Cover page**: title + date + article count, quote of the day from Wikiquote, a \"World news\" block (5 tech headlines + 2 Western items from Wikipedia's Current Events portal, each compressed to a single sentence).**Contents**: every article grouped by source, with dot-leaders to its publication date.**\"Did you know…\"** trivia nuggets from Wikipedia's Main Page.**The articles themselves**, set in two-column Latin Modern with proper paragraph indents, hyphenation, microtypography. Math (`$x = y$`\n\n,`$$\\int f$$`\n\n,`\\(...\\)`\n\n,`\\[...\\]`\n\n) is rendered as real LaTeX math. Code blocks (fenced or inline) come through in monospace.- All non-English source content (heise, etc.) is translated to English during the rewrite step. You can disable that in the prompt if you don't want it.\n\n```\n                   sources.toml\n                       │\n            ┌──────────┴──────────┐\n            │                     │\n            ▼                     ▼\n       ┌────────┐            ┌────────┐\n       │ gather │            │ wiki/  │\n       │  HN +  │            │ news + │\n       │  RSS   │            │  QOTD  │\n       └───┬────┘            └───┬────┘\n           ▼                     │\n       ┌────────┐                │\n       │extract │                │\n       │ (traf- │                │\n       │  ilatura)               │\n       └───┬────┘                │\n           ▼                     │\n       ┌─────────┐               │\n       │summarize│ ─── Claude    │\n       └───┬─────┘               │\n           ▼                     │\n       ┌─────────┐               │\n       │ rewrite │ ─── Claude    │\n       └───┬─────┘               │\n           ▼                     ▼\n       SQLite store (state.db)   in-memory\n           │                     │\n           └──────────┬──────────┘\n                     ▼\n              ┌──────────┐\n              │  render  │ ── xelatex\n              └────┬─────┘\n                   ▼\n             archive/cache/<hash>.pdf\n```\n\nFour stages, each idempotent and resumable:\n\n**gather**— pulls new items from each source, runs`trafilatura`\n\nto extract the article body, stores the raw text. Pure I/O — no LLM cost.**summarize**— batches up to 8 articles per Claude call and produces a ≤40-word two-sentence summary for each (used as the lede in the front matter and in the contents listing).**rewrite**— batches up to 8 articles per Claude call (streamed because the output is long) and produces a clean, properly-paragraphed, translated-to-English version of each article body for the renderer. Preserves code fences and`$math$`\n\nexactly.**render**— pulls the latest N articles per source from the store, plus fresh world news + quote + DYK, and runs them through a Jinja template into xelatex → PDF. Results are cached by a hash of \"what's in the store\" + \"what's in sources.toml\". Same content + same config → same cached PDF served instantly.\n\nA background `APScheduler`\n\njob runs steps 1–3 every 4 hours (configurable).\nThe render step is on-demand; the first hit to `/digest.pdf`\n\nafter an ingest\nbuilds the PDF and caches it.\n\n| route | what it does |\n|---|---|\n`GET /` |\nminimal landing page, cover preview + Read PDF link |\n`GET /digest.pdf` |\nthe current edition (built on demand, then cached) |\n`GET /preview.png` |\npage 1 rasterized at 180 DPI |\n`GET /sources` |\nJSON list of configured sources + latest `fetched_at` |\n`GET /healthz` |\nliveness probe (returns `ok` ) |\n`POST /ingest` |\nmanual kick of the gather → summarize → rewrite cycle |\n\nSources live in [ sources.toml](/marcj/papernews/blob/main/sources.toml) — that's the exact file used\nto produce\n\n[the sample PDF](/marcj/papernews/blob/main/sample-2026-06-04.pdf). Open it, copy a block, edit, restart the container, refresh\n\n`/digest.pdf`\n\n.The order of `[[source]]`\n\nblocks in the file is the order they'll appear in\nthe PDF — sources at the top come first. World news, quote of the day, and\nthe \"Did you know…\" nuggets are not configured here — they're cover\ndecorations, fetched fresh on every render.\n\nRanks stories by points within a time window. No URL needed; the API is hardcoded.\n\n| field | type | default | meaning |\n|---|---|---|---|\n`name` |\nstring | required | display label (also the contents-page heading) |\n`kind` |\nstring | required | must be `\"hn\"` |\n`limit` |\nint | `10` |\nhow many top stories to keep |\n`since_hours` |\nint | `48` |\nonly consider stories submitted in the last N hours |\n`min_points` |\nint | `50` |\nstory must have at least this many points to qualify |\n\n```\n[[source]]\nname        = \"Hacker News\"\nkind        = \"hn\"\nlimit       = 10\nsince_hours = 48\nmin_points  = 100\n```\n\nParsed with [feedparser](https://feedparser.readthedocs.io/), so it accepts\nRSS 0.9/1.0/2.0 and Atom 1.0 — every blog and most news sites work.\n\n| field | type | default | meaning |\n|---|---|---|---|\n`name` |\nstring | required | display label (also the contents-page heading) |\n`kind` |\nstring | required | must be `\"rss\"` |\n`url` |\nstring | required | feed URL |\n`limit` |\nint | `20` |\ntake at most N most-recent items |\n\n```\n[[source]]\nname  = \"Quanta Magazine\"\nkind  = \"rss\"\nurl   = \"https://www.quantamagazine.org/feed/\"\nlimit = 8\n```\n\nThe `limit`\n\nis applied **twice**, on purpose:\n\n- At\n**fetch** time: gather doesn't pull more than`limit`\n\nitems from the feed (saves bandwidth and trafilatura time). - At\n**render** time: even if the store accumulates more than`limit`\n\nitems for a source across multiple ingests (it will — items don't get deleted), only the latest`limit`\n\nper source make it into a given PDF.\n\nSo if you want Quanta to have at most 8 articles in the issue, regardless of\nhow many they've published this week → set `limit = 8`\n\n. If you want Hacker\nNews to show only the top 5 by points in the last 24h → set `limit = 5, since_hours = 24`\n\n.\n\nOn the totals.Adding up every`limit`\n\nin`sources.toml`\n\ngives you the maximum article count per issue. Aim for30–60 articlesfor a comfortable 30–60 minute read. Claude's summaries are dense; volume isn't quality. An empty section on a slow day is cleaner than padding.\n\nTwo modes; pick whichever fits your routine. Set the env var in `.env`\n\n.\n\n```\n# .env\nINGEST_INTERVAL_SECONDS=14400   # 4 hours (the default)\n# .env\nINGEST_SCHEDULE=07:00,18:00     # comma-separated HH:MM\nINGEST_TIMEZONE=Europe/London   # any IANA tz; default UTC\n```\n\nIf both are set, `INGEST_SCHEDULE`\n\nwins. The render is still on-demand —\nhitting `/digest.pdf`\n\nbetween scheduled runs gives you the cached PDF\ninstantly.\n\nYou can also kick a manual ingest any time:\n\n```\ncurl -X POST http://localhost:8000/ingest\n```\n\nA built-in hook fires after every successful ingest. Point\n`POST_INGEST_HOOK`\n\nat any executable on the container's filesystem (drop\nthe script into your `./data/hooks/`\n\ndirectory so it survives rebuilds via\nthe bind mount). The hook receives the freshly-built PDF path as its first\nargument.\n\n```\n# .env\nPOST_INGEST_HOOK=/data/hooks/push-to-remarkable.sh\nPOST_INGEST_HOOK_TIMEOUT=300    # optional; default 300s\n```\n\nHook failures are non-fatal — a broken hook logs an error but doesn't crash the ingest loop.\n\nDrop this in `./data/hooks/push-to-remarkable.sh`\n\nand `chmod +x`\n\nit:\n\n``` bash\n#!/usr/bin/env bash\n# Push the latest issue to a reMarkable 2 via SSH.\n# Usage: push-to-remarkable.sh <pdf-path>\nset -euo pipefail\n\nPDF=\"$1\"\nREMARKABLE=\"root@10.11.99.1\"            # adjust to your device's IP\nSSH_KEY=/data/hooks/remarkable_id_ed25519\n\nscp -i \"$SSH_KEY\" -o StrictHostKeyChecking=accept-new \\\n    \"$PDF\" \"$REMARKABLE:/home/root/papernews.pdf\"\n\n# Refresh the UI so the file appears immediately.\nssh -i \"$SSH_KEY\" \"$REMARKABLE\" 'systemctl restart xochitl'\n```\n\nGenerate a passwordless key (`ssh-keygen -t ed25519 -f data/hooks/remarkable_id_ed25519 -N \"\"`\n\n), add the `.pub`\n\nto the\nreMarkable's `/home/root/.ssh/authorized_keys`\n\nonce, and from then on\nevery ingest pushes the new paper to your device.\n\nThe same pattern works for Kindle (`scp`\n\nover USB networking), a network\nprinter (`lp -d papernews \"$PDF\"`\n\n), an email (`mutt -a \"$PDF\"`\n\n), or\nanything else you can script.\n\nModest, no-network unittest suite for the web/scheduling/hook behaviour:\n\n```\npython -m unittest discover -s tests\n```\n\nYou don't have to use Docker — the CLI works directly:\n\n```\npython3 -m venv .venv\n.venv/bin/pip install -e .\nexport ANTHROPIC_API_KEY=sk-ant-...\n\n.venv/bin/python -m papernews gather       # fetch + extract\n.venv/bin/python -m papernews summarize    # claude pass 1 (batched)\n.venv/bin/python -m papernews rewrite      # claude pass 2 (batched, streamed)\n.venv/bin/python -m papernews render       # xelatex → PDF\n# or all of the above in sequence:\n.venv/bin/python -m papernews build\n```\n\nRequirements: Python 3.11+, `xelatex`\n\n(TeX Live with `texlive-xetex`\n\n,\n`texlive-latex-extra`\n\n, `lmodern`\n\n), `pdftoppm`\n\n(poppler).\n\nEverything visual lives in one file: [ papernews/template.tex.j2](/marcj/papernews/blob/main/papernews/template.tex.j2).\n\n- Page size:\n`paperwidth=157mm, paperheight=210mm`\n\n(tuned for reMarkable Pro) - Body font: Latin Modern Roman 10pt\n- Two-column body for any article over 2000 characters; single-column otherwise\n- First-line paragraph indent instead of vertical\n`\\parskip`\n\n(classic magazine convention) - Microtype protrusion + expansion\n- Letter-spacing on small-caps source labels via fontspec's\n`LetterSpace`\n\nCustomize whatever you like — the Jinja delimiters are LaTeX-safe\n(`((* ... *))`\n\nfor blocks, `((( ... )))`\n\nfor variables) so your `{`\n\n, `}`\n\nand\n`\\`\n\ndon't fight each other.\n\nRoughly per ingest cycle, with Claude Haiku 4.5 (default model):\n\n- ~50 articles\n- Summarize: 6 batched calls (~8 articles each)\n- Rewrite: 6 batched calls, streamed\n- World-news compress: 1 call\n\nOrder-of-magnitude: a few cents to a few tens of cents per cycle depending on article lengths. At 6 cycles/day that's well under $1/day. Going to Sonnet or Opus multiplies the bill ~10–30×.\n\nSet a spend cap at\n[https://console.anthropic.com/settings/billing](https://console.anthropic.com/settings/billing) → Spend limits — the run-loop\ncan't surprise you above whatever you set.\n\n- All data lives on your machine (\n`./data/state.db`\n\n+`./data/archive/cache/`\n\n). - Article text is sent to the Anthropic API for summarization and rewriting. That's the only outbound destination for content (besides fetching the feeds themselves).\n- No analytics, no telemetry, no third-party scripts in the landing page.\n\n```\npapernews/\n├── papernews/\n│   ├── fetch.py          # HN Algolia + RSS feedparser\n│   ├── extract.py        # trafilatura\n│   ├── summarize.py      # Anthropic SDK, batched\n│   ├── rewrite.py        # Anthropic SDK, batched + streamed\n│   ├── wiki.py           # World news / Quote / DYK / tech feeds\n│   ├── store.py          # SQLite article store + queries\n│   ├── render.py         # Jinja + xelatex\n│   ├── preview.py        # PDF → PNG via pdftoppm\n│   ├── cache.py          # On-disk cache by content hash\n│   ├── cli.py            # papernews command\n│   ├── web.py            # Flask + APScheduler\n│   └── template.tex.j2   # the magazine\n├── sources.toml          # configured feeds\n├── pyproject.toml\n├── Dockerfile\n├── docker-compose.yml\n└── data/                 # gitignored — your SQLite + cached PDFs\n```\n\nOpen an issue first if you're planning something non-trivial — happy to talk about direction. The codebase is small enough that you can read it end to end in an hour.\n\nMIT — see [LICENSE](/marcj/papernews/blob/main/LICENSE).\n\nWorking name; happy to take suggestions. The vibe is: an old-fashioned daily paper, not a feed. You read it once, then you put it down.", "url": "https://wpnews.pro/news/show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable", "canonical_source": "https://github.com/marcj/papernews", "published_at": "2026-06-04 23:28:26+00:00", "updated_at": "2026-06-04 23:47:43.556521+00:00", "lang": "en", "topics": ["ai-products", "ai-tools", "ai-infrastructure"], "entities": ["Papernews", "Claude", "Anthropic", "reMarkable", "Docker", "Hacker News", "MacRumors", "Quanta"], "alternates": {"html": "https://wpnews.pro/news/show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable", "markdown": "https://wpnews.pro/news/show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable.md", "text": "https://wpnews.pro/news/show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable.txt", "jsonld": "https://wpnews.pro/news/show-hn-papernews-self-hosted-daily-newspaper-pdf-for-your-remarkable.jsonld"}}