cd /news/artificial-intelligence/seeking-indic-document-dataset-india… · home topics artificial-intelligence article
[ARTICLE · art-36230] src=discuss.huggingface.co ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

[SEEKING] Indic Document Dataset (India) — Invoices, Receipts, Utility Bills, Payment Advices, Packing Lists, Commercial Invoices, Credit Notes

QuantVectors, a data company specializing in document training datasets for AI and OCR models, is seeking annotated Indic document datasets in languages such as Hindi, Marathi, and Tamil. The company is looking for documents like invoices, receipts, and utility bills originating from India, and is open to both open-source and commercial datasets. QuantVectors also offers to purchase datasets outright.

read1 min views1 publishedJun 22, 2026

Hi, We’re looking for annotated document datasets in Indic languages for AI/OCR model training. We are specifically looking for documents originating from India. We’re open to both open-source and commercial datasets.

Languages needed:

Hindi, Marathi, Gujarati, Bengali, Punjabi, Tamil, Urdu, Telugu, Odia, Kannada, Malayalam, Assamese

Document types:

Invoice, Receipt, Utility Bill, Payment Advice, Packing List, Commercial Invoice, Credit Note

Specifications:

If you know of any: Please drop a comment or reach out — we’d really appreciate any pointers!

If you have data to sell: We also buy datasets outright. Contact us (details below) and we’ll respond within 5 business days.

About us:

QuantVectors is a data company specialising in document training datasets for AI and OCR models.

Contact: data@quantvectors.com | quantvectors.com

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @quantvectors 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/seeking-indic-docume…] indexed:0 read:1min 2026-06-22 ·