Hi, I’m looking for an experienced NLP/LLM engineer for a serious, well-scoped project: building the first RAG-based localization engine for a low-resource language spoken in South America.
The project is built on a proprietary corpus (pedagogical content, dictionary, proverbs, orthographic rules) developed over 4 years. Architecture, endpoints, quality criteria and deliverables are fully specified in a detailed technical brief (available under NDA).
**Technical scope (MVP — 10 weeks)**
- RAG pipeline on low-resource language corpus (LangChain or LlamaIndex)
- Multilingual embedding model (multilingual-e5 or equivalent)
- Vector DB: Pinecone, Weaviate or Supabase pgvector — latency < 500ms
- Modular prompt layer — 6 use-case templates (translate, educate, dub, subtitle, localize, campaign)
- Multi-tenant B2B SaaS infrastructure — strict data isolation, JWT auth, configurable quotas
- REST API + Swagger documentation
- Admin interface for glossary management (no-dev updates)
- Offline SQLite bundle for React Native mobile app
Profile required
- Proven RAG experience on low-resource or multilingual corpora
- Multi-tenant SaaS architecture references
- Modular prompt engineering in production (Anthropic or OpenAI API)
- Full IP transfer to client — non-negotiable
- NDA before corpus access
- Budget: 5,000–10,000€ depending on experience
- Start: within 2–3 weeks
If this matches your profile, please reply with concrete references on RAG low-resource languages, multi-tenant SaaS architecture, and modular prompt engineering in production