cd /news/ai-products/turn-a-folder-of-pdfs-into-a-live-js… Β· home β€Ί topics β€Ί ai-products β€Ί article
[ARTICLE Β· art-18191] src=parseapi.dev pub= topic=ai-products verified=true sentiment=↑ positive

Turn a folder of PDFs into a live JSON API

ParseApi founder Yonas has launched a tool that converts folders of PDFs into live JSON API endpoints, eliminating the need for manual schema definition. The platform infers data structures from uploaded documents and serves them as accessible endpoints, with editable extractions that track corrections as field-level diffs for improved accuracy over time.

read3 min publishedMay 29, 2026

Hi HN β€” I'm Yonas. I built ParseApi solo over the last few months and shipped it a few days ago.

The problem I kept hitting: I had a pile of PDFs β€” invoices, receipts, contracts β€” and I wanted the data in them as structured JSON. Every time, the same pipeline: OCR or LLM call, define the schema, handle retries, store the result, build an endpoint to serve it. Different shape, same plumbing. So I tried to collapse the whole thing into one step.

The folder is the unit, not the schema or the document.

  • Create a folder (e.g. invoices

) - Drag PDFs into it β€” they parse in real time

  • That folder is now a live endpoint:
GET https://api.parseapi.dev/v1/{username}/invoices
  • Paste the URL into your app. That's the whole onboarding.

No schema definition step. When you drop the first few documents in a folder, it infers a schema from them and then conforms every later upload to that schema. You can edit the schema afterward, and optionally re-run past extractions against the new version.

The piece I care most about is editable extractions. AI extraction is never perfect on the long tail of real-world documents. So the original model output is stored immutably, and corrections are tracked separately as field-level diffs β€” who changed what, when, why. The API returns the corrected value by default; ?include_raw=true

returns both. Wrong extractions are one click from being fixed by the user, and over time those corrections become real feedback signal. This part felt missing from every existing document-AI tool I tried.

A few other things that are in:

Per-folder auth: public, API key, JWT, or basic β€” your choice per folder, not platform-widePluggable AI providers: Anthropic, OpenAI, Gemini, Ollama, anything OpenAI-compatible, with a fallback chain and a per-request cost capAuto-generated OpenAPI spec per folder, with browsable docsWebhooks for extraction events, plus live status via SignalRSource highlighting: click a JSON field, see the bounding box in the doc

Stack, for the curious β€” and because the choices were deliberate:

  • Pure .NET 9. No Node, no React, no Next.js. Razor Pages + HTMX + Alpine.js for the UI. Tailwind via the standalone CLI, no npm in the build chain. Modular monolith, single deployable on Render. Subdomains (app.

/api.

/admin.

) route to areas within one app.Postgres(schema-isolated so it can share a database with other projects),** Hangfirefor background jobs (in-process), Cloudflare R2for object storage, Stripe**for billing.

The "in pure .NET" choice was contrarian on purpose. The default playbook for an AI-heavy SaaS is Next.js + Python + a managed everything stack. I wanted to see whether a one-person shop could ship something competitive without the JavaScript ecosystem on the server. Three months in, I'm convinced the answer is yes β€” and I'll probably write that up properly once the product has more reps on it.

I also spent the two weeks before launch building a kill-switch system into the app β€” feature flags, automated tripwires that cap AI spend, cooldowns, fail-closed defaults. I wrote about that in Building a kill switch before letting anyone use my SaaS if anyone's interested in that piece specifically.

Honest limitations #

  • It's early. Genuinely. I have a handful of signups so far and no one has uploaded a real document yet.
  • Extraction quality depends on which model the router picks, and the routing logic is conservative right now (Haiku / 4o-mini by default to keep costs sane).
  • Bounding-box accuracy varies per provider β€” it's good on cleanly-laid-out PDFs and weaker on multi-column or scanned documents.
  • Schema inference works well on the document types I've tested most (invoices, receipts, basic forms). Long-tail document types will surface bugs.

What I'd love feedback on: what your weirdest extraction case is. The documents that broke other tools. That's the stuff I learn from.

Free tier is 100 pages/month, 2 folders, no credit card. Built solo, run solo, and I'll be in the thread answering everything.

── more in #ai-products 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/turn-a-folder-of-pdf…] indexed:0 read:3min 2026-05-29 Β· β€”