{"slug": "seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment", "title": "[SEEKING] Indic Document Dataset (India) — Invoices, Receipts, Utility Bills, Payment Advices, Packing Lists, Commercial Invoices, Credit Notes", "summary": "QuantVectors, a data company specializing in document training datasets for AI and OCR models, is seeking annotated Indic document datasets in languages such as Hindi, Marathi, and Tamil. The company is looking for documents like invoices, receipts, and utility bills originating from India, and is open to both open-source and commercial datasets. QuantVectors also offers to purchase datasets outright.", "body_md": "Hi,\n\nWe’re looking for annotated document datasets in Indic languages for AI/OCR model training. We are specifically looking for documents originating from India. We’re open to both open-source and commercial datasets.\n\nLanguages needed:\n\nHindi, Marathi, Gujarati, Bengali, Punjabi, Tamil, Urdu, Telugu, Odia, Kannada, Malayalam, Assamese\n\nDocument types:\n\nInvoice, Receipt, Utility Bill, Payment Advice, Packing List, Commercial Invoice, Credit Note\n\nSpecifications:\n\nIf you know of any:\n\nPlease drop a comment or reach out — we’d really appreciate any pointers!\n\nIf you have data to sell:\n\nWe also buy datasets outright. Contact us (details below) and we’ll respond within 5 business days.\n\nAbout us:\n\nQuantVectors is a data company specialising in document training datasets for AI and OCR models.\n\nContact: [data@quantvectors.com](mailto:data@quantvectors.com) | [quantvectors.com](http://quantvectors.com)", "url": "https://wpnews.pro/news/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment", "canonical_source": "https://discuss.huggingface.co/t/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment-advices-packing-lists-commercial-invoices-credit-notes/177055#post_1", "published_at": "2026-06-22 06:47:20+00:00", "updated_at": "2026-06-22 07:16:33.150754+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "computer-vision", "natural-language-processing", "ai-tools"], "entities": ["QuantVectors", "India", "Hindi", "Marathi", "Gujarati", "Bengali", "Tamil", "Urdu"], "alternates": {"html": "https://wpnews.pro/news/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment", "markdown": "https://wpnews.pro/news/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment.md", "text": "https://wpnews.pro/news/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment.txt", "jsonld": "https://wpnews.pro/news/seeking-indic-document-dataset-india-invoices-receipts-utility-bills-payment.jsonld"}}