# Turn a folder of PDFs into a live JSON API

> Source: <https://parseapi.dev/blog/show-hn-folder-to-json-api>
> Published: 2026-05-29 23:11:30+00:00

# Show HN: Turn a Folder of PDFs Into a Live JSON API

Hi HN — I'm Yonas. I built ParseApi solo over the last few months and shipped it a few days ago.

The problem I kept hitting: I had a pile of PDFs — invoices, receipts, contracts — and I wanted the data in them as structured JSON. Every time, the same pipeline: OCR or LLM call, define the schema, handle retries, store the result, build an endpoint to serve it. Different shape, same plumbing. So I tried to collapse the whole thing into one step.

**The folder is the unit, not the schema or the document.**

- Create a folder (e.g.
`invoices`

) - Drag PDFs into it — they parse in real time
- That folder is now a live endpoint:

```
GET https://api.parseapi.dev/v1/{username}/invoices
```

- Paste the URL into your app. That's the whole onboarding.

No schema definition step. When you drop the first few documents in a folder, it infers a schema from them and then conforms every later upload to that schema. You can edit the schema afterward, and optionally re-run past extractions against the new version.

The piece I care most about is **editable extractions**. AI extraction is never perfect on the long tail of real-world documents. So the original model output is stored immutably, and corrections are tracked separately as field-level diffs — who changed what, when, why. The API returns the corrected value by default; `?include_raw=true`

returns both. Wrong extractions are one click from being fixed by the user, and over time those corrections become real feedback signal. This part felt missing from every existing document-AI tool I tried.

A few other things that are in:

**Per-folder auth:** public, API key, JWT, or basic — your choice per folder, not platform-wide**Pluggable AI providers:** Anthropic, OpenAI, Gemini, Ollama, anything OpenAI-compatible, with a fallback chain and a per-request cost cap**Auto-generated OpenAPI spec** per folder, with browsable docs**Webhooks** for extraction events, plus live status via SignalR**Source highlighting:** click a JSON field, see the bounding box in the doc

Stack, for the curious — and because the choices were deliberate:

- Pure
**.NET 9**. No Node, no React, no Next.js. Razor Pages + HTMX + Alpine.js for the UI. Tailwind via the standalone CLI, no npm in the build chain. **Modular monolith**, single deployable on Render. Subdomains (`app.`

/`api.`

/`admin.`

) route to areas within one app.**Postgres**(schema-isolated so it can share a database with other projects),** Hangfire**for background jobs (in-process),** Cloudflare R2**for object storage,** Stripe**for billing.

The "in pure .NET" choice was contrarian on purpose. The default playbook for an AI-heavy SaaS is Next.js + Python + a managed everything stack. I wanted to see whether a one-person shop could ship something competitive without the JavaScript ecosystem on the server. Three months in, I'm convinced the answer is yes — and I'll probably write that up properly once the product has more reps on it.

I also spent the two weeks before launch building a kill-switch system into the app — feature flags, automated tripwires that cap AI spend, cooldowns, fail-closed defaults. I wrote about that in [Building a kill switch before letting anyone use my SaaS](/blog/killswitch-saas) if anyone's interested in that piece specifically.

## Honest limitations

- It's early. Genuinely. I have a handful of signups so far and no one has uploaded a real document yet.
- Extraction quality depends on which model the router picks, and the routing logic is conservative right now (Haiku / 4o-mini by default to keep costs sane).
- Bounding-box accuracy varies per provider — it's good on cleanly-laid-out PDFs and weaker on multi-column or scanned documents.
- Schema inference works well on the document types I've tested most (invoices, receipts, basic forms). Long-tail document types will surface bugs.

**What I'd love feedback on:** what your weirdest extraction case is. The documents that broke other tools. That's the stuff I learn from.

Free tier is 100 pages/month, 2 folders, no credit card. Built solo, run solo, and I'll be in the thread answering everything.
