cd /news/ai-tools/show-hn-polyvia-multimodal-document-… · home topics ai-tools article
[ARTICLE · art-32524] src=github.com ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Polyvia – Multimodal document retrieval over 100K+ files

Polyvia released Polyvia 1, a multimodal document retrieval API and upcoming platform for enterprise agents, enabling sub-200ms search over 100K+ files including PDFs, charts, and slides. The API provides end-to-end retrieval without external extractors, targeting use cases like data room search and credit monitoring.

read5 min views2 publishedJun 18, 2026

We build enterprise agents for large-scale retrieval, research and automation over multimodal docs.

Docs · Quickstart · Python SDK · TypeScript SDK · Polyvia Platform · Homepage

We’re releasing Polyvia 1, as two products:

Polyvia API: Multimodal Document Retrieval API(for developers of AI agents) - available now.** Polyvia Platform: Research and Automation Agent over 100K+ multimodal docs**(for knowledge workers in enterprises) - coming soon.

We index your unstructured & visual & multimodal docs (PDFs, charts, slides, complex tables, infographics, scans, handwriting, invoices, and more) into multimodal knowledge ontology, with agents running on top for retrieval, research and automation — every answer grounded in a cited source page, in sub-200ms.

1. Fast over 100K+ multimodal docs. Agentic, file-by-file search (Claude Code, Claude Cowork, Codex) works only up to ~100 multimodal files — past that it's too slow, and at scale you still need retrieval. Polyvia does sub-200ms search over 100K+ files, every answer grounded in a cited source page.

2. End-to-end — no need for extractors or PDF parsers. When you build large-scale multimodal RAG over a company's files, the only infra available today is visual extractors / PDF parsers (Reducto, LlamaIndex). There's no end-to-end infra for large-scale multimodal document retrieval — until Polyvia: VLM Visual Extractor → Multimodal Knowledge Ontology (mapping all your company's data and processes) → Self-Improving Retrieval Agent.

3. All unstructured, visual and multimodal data inputs in one API. Available now: PDFs, charts, infographics, complex multi-page tables, slides, pictures, handwriting, scans, invoices, audio. Coming soon: video, healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, heatmaps.

Multimodal RAG inside your own agent— retrieval-as-a-tool over large doc sets.** Data-room / due-diligence search**— query 100+ visual-heavy PDFs jointly (PE, IB, M&A).** Counterparty & credit monitoring**— EBITDA, opex, revenue across hundreds of borrower reports.** Image-based claim processing**— describe claim photos in the context of a policy.** Cross-engagement slide search**— find answers buried in thousands of slides.

pip install polyvia        # Python 3.9+
npm  install polyvia       # Node 18+

Grab a key in Polyvia PlatformSettings → API. Ingest a batch into a group, then ask one question across the whole corpus — answers cite the exact page in each document.

from polyvia import Polyvia

client = Polyvia(api_key="poly_<key>")  # or set POLYVIA_API_KEY

items = client.ingest.batch(
    ["q1.pdf", "q2.pdf", "q3.pdf", "q4.pdf"],
    group="FY24 Earnings",
)
for item in items:
    client.ingest.wait(item.task_id)

print(client.query("How did revenue trend across the four quarters?",
                   group="FY24 Earnings").answer)
js
import { Polyvia } from "polyvia";

const client = new Polyvia({ apiKey: "poly_<key>" });

const items = await client.ingest.batch(
  ["q1.pdf", "q2.pdf", "q3.pdf", "q4.pdf"],
  { group: "FY24 Earnings" },
);
await Promise.all(items.map((i) => client.ingest.wait(i.task_id)));

const answer = await client.query(
  "How did revenue trend across the four quarters?",
  { group: "FY24 Earnings" },
);
console.log(answer.answer);

Scope a query three ways: a single document_id

(fastest), a group

/ group_ids

, or the whole workspace (no scope).

Runnable scripts live in examples/. A few highlights:

Example What it shows
query_scopes.py

groups_and_documents.py

batch_group.py

async_client.py

AsyncPolyvia

— the same surface, awaitableagent_tool.py

curl.sh

Querying across scopes, for example:

client.query("What risks recur across all reports?")
client.query("How did revenue trend?", group="FY24 Earnings")
client.query("Executive summary?", document_id="doc_<id>")
client.query("Compare the deals.", group_ids=["g_<id>", "g_<id>"])

MCP — connect Claude Code (or any MCP client) to the hosted Polyvia MCP server in one line, so your agent can retrieve over your documents as a tool:

claude mcp add --transport http polyvia https://app.polyvia.ai/mcp \
  --header "Authorization: Bearer poly_<your-key>"

Agent Skills — install Polyvia skills into Claude Code, Cursor, and other agent clients:

npx skills add polyvia-ai/skills

MCP docs · Agent Skills

Product For Status
Polyvia-1.1
Polyvia API — Multimodal Document Retrieval API
Developers of AI agents Available now
Polyvia-1.2
Polyvia Platform — Research & Automation Agent over 100K+ multimodal docs
Knowledge workers in enterprises Coming soon
Later
Polyvia Agents — build your own agent for automating processes on large volumes of multimodal docs
Builders & Teams Planned
Later
More modalities — video, healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, heatmaps
Builders & teams Planned

We update this as we ship — latest first. Full notes at docs.polyvia.ai/versions.

REST API v1ingest

,documents

,groups

,query

,usage

,rate-limits

; async ingestion with task polling and grounded citations.Python SDKpip install polyvia

; typed syncand async clients, batch ingestion, idempotent groups, structured errors.TypeScript SDKnpm install polyvia

; fully typed, ESM/CJS, Node 18+.MCP serverclaude mcp add --transport http polyvia https://app.polyvia.ai/mcp --header "Authorization: Bearer poly_<your-key>"

.Agent Skillsnpx skills add polyvia-ai/skills

for Claude Code, Cursor, and other agent clients.Visual Document Modalities— Visual Document Intelligence + Audio: charts, graphs & plots, infographics, complex multi-page tables, slides & decks, reports & filings, scanned & photographed pages, invoices & forms, handwriting & annotations, diagrams & flowcharts, photos & images, and audio (calls, meetings, recordings).

Polyvia-1.2 — Polyvia Platform— Research & Automation Agent over 100K+ multimodal docs, for knowledge workers in enterprises.** More modalities (coming soon)— healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, video, heatmaps. Polyvia Agents**— build your own agent for automating processes on large volumes of multimodal documents.

Install Source
Python pip install polyvia

npm install polyvia

docs.polyvia.ai/products/js-sdkdocs.polyvia.ai/api-referenceapp.polyvia.ai/mcp

docs.polyvia.ai/products/mcpnpx skills add polyvia-ai/skills

docs.polyvia.ai/products/skillsSupported inputs: PDFs · Word/PowerPoint/Excel (DOCX/PPTX/XLSX) · Markdown · text · images · audio. Charts, infographics, complex multi-page tables, slides, scans and handwriting are first-class.

Runnable snippets (Python, TypeScript, raw HTTP, MCP, agent-tool) live in examples/ — see the

examples guide. See also

·

CHANGELOG

·

CONTRIBUTING

.

SECURITY

New to Polyvia? See what it does at ** polyvia.ai**, or start free at

.

app.polyvia.ai📚 Docs · 🖥️ Platform · ✉️ mateusz@polyvia.ai · senyao@polyvia.ai

© 2026 Polyvia. All rights reserved.

── more in #ai-tools 4 stories · sorted by recency
── more on @polyvia 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-polyvia-mult…] indexed:0 read:5min 2026-06-18 ·