Hi, We’re looking for annotated document datasets in Indic languages for AI/OCR model training. We are specifically looking for documents originating from India. We’re open to both open-source and commercial datasets.
Languages needed:
Hindi, Marathi, Gujarati, Bengali, Punjabi, Tamil, Urdu, Telugu, Odia, Kannada, Malayalam, Assamese
Document types:
Invoice, Receipt, Utility Bill, Payment Advice, Packing List, Commercial Invoice, Credit Note
Specifications:
If you know of any: Please drop a comment or reach out — we’d really appreciate any pointers!
If you have data to sell: We also buy datasets outright. Contact us (details below) and we’ll respond within 5 business days.
About us:
QuantVectors is a data company specialising in document training datasets for AI and OCR models.
Contact: data@quantvectors.com | quantvectors.com