{"slug": "show-hn-local-cpu-ocr-for-images-pdfs-webpages", "title": "Show HN: Local CPU OCR for images, PDFs, webpages", "summary": "A developer released textsnap, a command-line tool that performs OCR on images, screenshots, PDFs, and webpages entirely on a local CPU without requiring a GPU or cloud connection. The tool uses a quantized 0.9B vision-language model that runs offline after an initial 890 MB download, supporting clipboard input and output for screenshot-to-text workflows. The single-file Python module is designed for portability, allowing users to copy the tool and its model files to any machine for fully offline, air-gapped operation.", "body_md": "Snap any image, screenshot, or webpage into plaintext. No GPU. No cloud. One command.\n\n```\ntextsnap screenshot.png\n```\n\nThat's it. You get a `.txt`\n\nnext to your shell, recognized on your CPU, from a screenshot, a photo, an image URL, or even a webpage.\n\n- ⚡\n**Runs on CPU.** A 0.9B PaddleOCR-VL-1.5 vision-language model, quantized to q4 ONNX, parses full pages on a plain laptop. No CUDA. No M-series-only tricks. Plain old cores, pinned to your physical-core count. - 🖼\n**Images, screenshots, URLs, webpages.** Point it at a local file, a direct image URL, or a full article URL — it isolates the main content and OCRs the most prominent image. Or OCR straight from your clipboard with no argument at all — and get the text put*back*on your clipboard, ready to paste. - 📴\n**Offline after first run.**~890 MB of ONNX downloads once to your cache and stays there. No API keys. No quotas. Your images never leave your machine. - 🎒\n**Portable.** Drop the model files next to the script and the whole folder becomes a self-contained, copy-anywhere tool — no install, no download, no flags. - 🪶\n**One file.** The whole tool is a single Python module. Dependencies install themselves on first run if missing. - 📝\n**Markdown or plaintext.** Default output is the model's native markdown (tables, headings, structure preserved). Add`--plaintext`\n\nto flatten it.\n\n```\n# Install\npip install textsnap\n\n# Snap something\ntextsnap screenshot.png\ntextsnap https://example.com/article --plaintext\ntextsnap photo.jpg -o ~/notes/receipt.txt\n```\n\nThe first run downloads the model (~890 MB). Every run after is offline.\n\n| Source | Example |\n|---|---|\n| Clipboard | `textsnap` (no argument) |\n| Local image file | `textsnap path/to/img.png` |\n| Direct image URL | `textsnap https://example.com/x.png` |\n| Webpage URL | `textsnap https://example.com/article` |\n\nLocal files cover anything Pillow can decode: `.png`\n\n, `.jpg`\n\n, `.jpeg`\n\n, `.webp`\n\n, `.bmp`\n\n, `.gif`\n\n, `.tiff`\n\n, and friends. For webpage URLs, textsnap uses readability to isolate the main content, then picks the most prominent image on the page and OCRs that.\n\nRun `textsnap`\n\nwith **no argument** and it reads the image currently on your clipboard. The recognized text is then copied **straight back to the clipboard**, so a screenshot-to-text round trip is just: snap → `textsnap`\n\n→ paste.\n\nThe `.txt`\n\nfile is still written as well (and its path still printed to stdout), so nothing about scripting changes — the clipboard copy is a pure convenience layered on top.\n\nClipboard-out uses your platform's native tool — `pbcopy`\n\n(macOS), `clip`\n\n(Windows), or `wl-copy`\n\n/ `xclip`\n\n/ `xsel`\n\n(Linux) — so it needs no extra Python package. If none of those is installed, textsnap simply skips the clipboard copy; the `.txt`\n\nfile is always there regardless. (Run with `-v`\n\nto see whether the copy succeeded.)\n\nBy default textsnap downloads its model files to an OS cache directory (`~/.cache/textsnap/`\n\n). But if it finds the model files **sitting next to the script**, it uses those directly — no download, no `--model-dir`\n\nflag, no setup at all.\n\n\"Next to the script\" means a layout like:\n\n```\ntextsnap/\n├── textsnap.py\n├── onnx/\n│   ├── vision_encoder_q4.onnx\n│   ├── decoder_q4.onnx\n│   └── embedding.onnx\n└── tokenizer.json\n```\n\nDrop those files in, and you can copy the entire `textsnap/`\n\nfolder to any machine — a USB stick, an air-gapped box, a fresh laptop — and run it immediately, fully offline, with zero install steps.\n\nModel-directory resolution order:\n\n`--model-dir DIR`\n\n— if you pass it explicitly, it always wins.**Portable**— model files found next to the script.** OS cache**—`~/.cache/textsnap/`\n\n, downloading on first run if needed.\n\nLike\n\n`--model-dir`\n\n, portable-mode files arenotSHA-256 verified — files you placed there yourself are trusted by definition. Integrity verification applies to files textsnapdownloads. See[Security].\n\n```\npip install textsnap\n```\n\nInstalls two equivalent commands on your `PATH`\n\n: ** textsnap** (canonical) and\n\n**(alias, for when the name slips your mind).**\n\n`ocr`\n\nTo install from a local source checkout instead:\n\n```\npip install .\n```\n\nFor a reproducible install with exact pinned dependency versions:\n\n```\npip install -r requirements-lock.txt\npip install .\n```\n\nClipboard note.Reading imagesfromthe clipboard relies on Pillow's`ImageGrab`\n\n; on Linux you may need`xclip`\n\nor`wl-clipboard`\n\ninstalled. Writing recognized textbackto the clipboard uses`pbcopy`\n\n/`clip`\n\n/`wl-copy`\n\n/`xclip`\n\n/`xsel`\n\n. macOS and Windows work out of the box.\n\n```\n# Clipboard (no argument) — text is also copied back to the clipboard\ntextsnap\n\n# Local image file\ntextsnap path/to/screenshot.png\n\n# Direct image URL\ntextsnap \"https://example.com/diagram.png\"\n\n# Webpage — OCRs the most prominent image on the page\ntextsnap \"https://example.com/article\"\n\n# Flatten the model's markdown to plain text\ntextsnap input.png --plaintext\n\n# Custom output path\ntextsnap input.png -o ./out/extracted.txt\n\n# Raise the token cap for very dense pages\ntextsnap dense-page.png --max-tokens 4096\n\n# Trade accuracy for speed by shrinking the image budget\ntextsnap input.png --max-pixels 250000\n\n# Use a local model directory instead of downloading\ntextsnap input.png --model-dir ~/models/paddleocr-vl\n```\n\nPlaintext, UTF-8. Default location is `./textsnaps/`\n\n(created if missing) under the current working directory; override with `-o`\n\n. The filename is derived from the image filename stem (`receipt_ocr.txt`\n\n), or from the webpage slug for URL inputs.\n\ntextsnap is quiet by default, Unix-style: the **only** thing printed to stdout is the path to the file it wrote, so it composes cleanly —\n\n```\nOUT=$(textsnap receipt.png)   # capture the path\ntextsnap receipt.png | xargs cat   # print the recognized text\n```\n\nWhen the input is the clipboard, the recognized text is *also* placed on the clipboard — see [Clipboard in, clipboard out](#clipboard-in-clipboard-out).\n\nPass `-v`\n\nto send progress diagnostics (input type, image size, decode speed, token counts) to **stderr**; stdout stays just the path either way.\n\nDefault file output is the model's **native markdown** — it preserves tables, headings, and document structure:\n\n```\n# Quarterly Report\n\n| Region | Revenue |\n| ------ | ------- |\n| EMEA   | $1.2M   |\n| APAC   | $0.9M   |\n```\n\nWith ** --plaintext**, markdown is flattened to bare text:\n\n```\nQuarterly Report\n\nRegion Revenue\nEMEA $1.2M\nAPAC $0.9M\n```\n\n| Flag | Description |\n|---|---|\n`-o` , `--output` |\nOutput `.txt` path. Default: `./textsnaps/<name>_ocr.txt` . |\n`-v` , `--verbose` |\nPrint progress diagnostics to stderr. Off by default. |\n`--plaintext` |\nFlatten the model's native markdown to plain text. |\n`--model-dir` |\nUse ONNX/config files from this directory. Overrides portable mode and the OS cache. |\n`--max-tokens` |\nCap generated tokens. Default `2048` . Raise it for very dense pages. |\n`--max-pixels` |\nImage pixel budget fed to the vision encoder. Default is the model's maximum. Lower trades accuracy for speed; too low makes the model hallucinate. The image is only ever shrunk, never enlarged. |\n`--no-verify` |\nSkip SHA-256 verification of downloaded model files (not advised). |\n`--generate-checksums` |\nDownload the pinned model files, write a fresh manifest, and exit. |\n\nAn environment variable, `TEXTSNAP_DECODE_THREADS`\n\n, overrides the decoder's intra-op thread count if you want to tune CPU decode for a specific machine. Left unset, textsnap picks a sensible default based on your physical core count.\n\ntextsnap auto-downloads ~890 MB of model weights from the Hugging Face Hub on first run, so it treats those files as untrusted until proven otherwise:\n\n**Pinned model revision.** Downloads are pinned to a specific repo revision, so a moved or retagged`main`\n\ncan't silently swap the weights.**SHA-256 verification.** Every downloaded file is hashed and checked against known-good digests before it's loaded. A mismatch aborts the run with a clear error rather than executing unverified weights. Digests live inand are also embedded in the script as a fallback, so verification works whether you install from source or from a wheel.`model_checksums.sha256`\n\n**Pinned dependencies.** pins exact dependency versions for reproducible installs; the file documents how to add per-wheel`requirements-lock.txt`\n\n`--hash`\n\nentries with`pip-compile --generate-hashes`\n\nfor full supply-chain pinning.\n\nVerification applies to files textsnap **downloads**. Model files you supply yourself — via `--model-dir`\n\nor [portable mode](#portable-mode) — are trusted as-is and not re-hashed; you are responsible for their provenance.\n\nRegenerate the checksum manifest after a deliberate model-revision bump:\n\n```\ntextsnap --generate-checksums\n```\n\nTo bypass verification (for local experimentation with a modified model), pass `--no-verify`\n\n.\n\n**Load.** From the clipboard, a local file, a direct image URL, or — for a webpage URL — the most prominent image inside the page's main content (readability + a prominence heuristic).**Preprocess.** The image is run through PaddleOCR-VL's Qwen2-VL-style smart-resize and patchify, producing the pixel-value tensor and grid the vision encoder expects. Smart-resize bounds the image to the model's pixel budget (tunable with`--max-pixels`\n\n) and snaps it to the patch grid — textsnap does not pre-shrink beyond that, since starving the encoder of resolution makes the model hallucinate rather than degrade gracefully.**Recognize.** Three ONNX components run on CPU: a vision encoder (q4), a token-embedding model (fp32), and an autoregressive decoder (q4) with a wired-up KV cache bound via ONNX Runtime IOBinding to avoid copying the cache each step. Greedy decode, guarded against runaway repetition by an n-gram block (it refuses to re-emit an n-gram it has already produced) plus a loop detector that trims any cycle that slips through.**Format.** Native markdown by default;`--plaintext`\n\nreduces it to bare text.\n\nNo image is sent anywhere. No state is kept between runs except the cached model.\n\nThe PaddleOCR-VL-1.5 ONNX components are downloaded on first run to `~/.cache/textsnap/`\n\n:\n\n`onnx/vision_encoder_q4.onnx`\n\n— vision encoder + spatial-merge projector`onnx/decoder_q4.onnx`\n\n— autoregressive decoder`onnx/embedding.onnx`\n\n— token embeddings (fp32; no q4 variant exists)`tokenizer.json`\n\n,`config.json`\n\nTogether ~890 MB. To use your own copy, either point `--model-dir`\n\nat a directory containing the same `onnx/`\n\nfiles plus `tokenizer.json`\n\nand `config.json`\n\n, or place those files next to the script for [portable mode](#portable-mode).\n\n**First run is the slow one**— it downloads ~890 MB. After that, textsnap is fully offline.** CPU decode is sequential.**Dense, full-page documents take longer than a short screenshot. textsnap pins thread counts to your physical cores and prints a live tokens/sec readout so a slow run is visibly alive, not hung.Very dense pages can hit the default 2048-token cap and truncate; raise it if the tail of a page is missing.`--max-tokens`\n\ncaps the output.Lowering it speeds up the vision encoder but feeds the model a coarser image; set it too low and recognition quality drops sharply. The default (the model's full budget) is the safe choice.`--max-pixels`\n\nis a speed/accuracy dial.**Webpage inputs OCR one image**— the most prominent one in the main content, not the whole rendered page.** Greedy decoding**can occasionally loop on repetitive layouts; an n-gram block prevents most loops outright and a detector trims any that remain.\n\nMIT for this project — see [LICENSE](/kouhxp/textsnap/blob/main/LICENSE).\n\nThe model is **PaddleOCR-VL-1.5**, distributed under Apache-2.0 by PaddlePaddle; textsnap pulls the ONNX export from [ onnx-community/PaddleOCR-VL-1.5-ONNX](https://huggingface.co/onnx-community/PaddleOCR-VL-1.5-ONNX). See the\n\n[original model card](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5)for model terms. Powered by\n\n[onnxruntime](https://onnxruntime.ai/)and\n\n[huggingface_hub](https://github.com/huggingface/huggingface_hub).", "url": "https://wpnews.pro/news/show-hn-local-cpu-ocr-for-images-pdfs-webpages", "canonical_source": "https://github.com/kouhxp/textsnap", "published_at": "2026-05-29 11:17:05+00:00", "updated_at": "2026-05-29 11:47:30.241829+00:00", "lang": "en", "topics": ["computer-vision", "ai-tools", "ai-products", "ai-infrastructure", "machine-learning"], "entities": ["PaddleOCR-VL-1.5", "ONNX", "textsnap"], "alternates": {"html": "https://wpnews.pro/news/show-hn-local-cpu-ocr-for-images-pdfs-webpages", "markdown": "https://wpnews.pro/news/show-hn-local-cpu-ocr-for-images-pdfs-webpages.md", "text": "https://wpnews.pro/news/show-hn-local-cpu-ocr-for-images-pdfs-webpages.txt", "jsonld": "https://wpnews.pro/news/show-hn-local-cpu-ocr-for-images-pdfs-webpages.jsonld"}}