[SEEKING] Indic Document Dataset (India) — Invoices, Receipts, Utility Bills, Payment Advices, Packing Lists, Commercial Invoices, Credit Notes

wpnews.pro

cd /news/artificial-intelligence/seeking-indic-document-dataset-india… · home › topics › artificial-intelligence › article

[ARTICLE · art-36230] src=discuss.huggingface.co ↗ pub=2026-06-22T06:47Z topic=artificial-intelligence verified=true sentiment=· neutral

[SEEKING] Indic Document Dataset (India) — Invoices, Receipts, Utility Bills, Payment Advices, Packing Lists, Commercial Invoices, Credit Notes

QuantVectors, a data company specializing in document training datasets for AI and OCR models, is seeking annotated Indic document datasets in languages such as Hindi, Marathi, and Tamil. The company is looking for documents like invoices, receipts, and utility bills originating from India, and is open to both open-source and commercial datasets. QuantVectors also offers to purchase datasets outright.

read1 min views1 publishedJun 22, 2026

Hi, We’re looking for annotated document datasets in Indic languages for AI/OCR model training. We are specifically looking for documents originating from India. We’re open to both open-source and commercial datasets.

Languages needed:

Hindi, Marathi, Gujarati, Bengali, Punjabi, Tamil, Urdu, Telugu, Odia, Kannada, Malayalam, Assamese

Document types:

Invoice, Receipt, Utility Bill, Payment Advice, Packing List, Commercial Invoice, Credit Note

Specifications:

If you know of any: Please drop a comment or reach out — we’d really appreciate any pointers!

If you have data to sell: We also buy datasets outright. Contact us (details below) and we’ll respond within 5 business days.

About us:

QuantVectors is a data company specialising in document training datasets for AI and OCR models.

Contact: data@quantvectors.com | quantvectors.com

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/seeking-indic-document-d…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/seeking-indic-document-…

mentioned entities

QuantVectors

India

Hindi

Marathi

Gujarati

Bengali

Tamil

Urdu

metadata

slugseeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldiscuss.huggingface.co

navigation

← prevS. Korean, Indian FMs to discuss…

next →Microsoft's Satya Nadella: We Ca…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 22 Jun · #artificial-intelligence

[Gemini API Hands-on]

letsdatascience.com · 22 Jun · #artificial-intelligence

Seoul Expands AI College Guidance for Migrant Students

koreaherald.com · 22 Jun · #artificial-intelligence

S. Korean, Indian FMs to discuss economy, AI, defense ties this week

dev.to · 22 Jun · #artificial-intelligence

I Went to Open Source Summit India and My Brain Is Still Processing

── more on @quantvectors 3 stories trending now

wpnews · 21 Jun · #large-language-models

Anthropic faces a class action lawsuit accusing it of selling Claude Max subscribers far less than advertised

wpnews · 21 Jun · #artificial-intelligence

Plotting AI model release cadence: two labs are accelerating, three aren't

wpnews · 21 Jun · #ai-safety

Author Argues for Slower AI Despite Cancer Benefits

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required