{"slug": "how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it", "title": "How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It", "summary": "A developer built Warung MiMo, an AI-powered assistant for Indonesian street shops that uses natural language processing to understand how shop owners actually speak. The project, which was abandoned after its initial development, was revived and completed with a custom Indonesian NLP parser that handles colloquial number words, product aliases, and multi-action sentences. The final system processes over 50 regex patterns and 30 number word mappings to parse commands like stock updates, debt tracking, and sales logging from natural Indonesian speech.", "body_md": "*This is a submission for the GitHub Finish-Up-A-Thon Challenge*\n\nWarung MiMo is an AI-powered assistant for small warungs (Indonesian street shops). It lets shop owners manage inventory, track debts, and log sales using natural Indonesian, either by voice, text, or receipt scanning.\n\nThe project started from a simple question: what if a tiny shop owner could talk to software the same way they talk to their helper? Not with menus, not with spreadsheets, just with the way people actually speak in a warung.\n\nI built the first version during the MiMo Orbit 100T Token Grant. The UI was there. The tech stack was solid. But the core engine, the Indonesian NLP parser, was not finished. The project sat abandoned until the GitHub Finish-Up-A-Thon Challenge gave me the push to revive it.\n\nTech stack: Next.js 16, React 19, shadcn/ui, Tailwind 4, TypeScript. Deployed on Vercel.\n\n**Live:** [https://warung-mimo.vercel.app](https://warung-mimo.vercel.app)\n\n**Source:** [https://github.com/iyop666/warung-mimo](https://github.com/iyop666/warung-mimo)\n\nTry these inputs in the assistant page:\n\nThe assistant parses each sentence, identifies products, extracts numbers (even written as Indonesian words), and generates structured actions like stock updates or debt records.\n\n**Before:**\n\nThe original Warung MiMo had a working UI, a product catalog, and a basic input field. But the NLP engine was shallow. It could handle simple commands like \"Indomie 5\" but failed on real Indonesian sentences like \"empat puluh dua ribu\" or \"setengah dus\" or \"sisa tiga\". The project was stuck in that painful zone where a repo exists but the product does not feel finished enough to trust.\n\n**What I changed, fixed, and added:**\n\n**Indonesian Number Parser.** Built from scratch with 30+ number words, compound logic (\"empat puluh dua\" = 42), and colloquial shortcuts (\"42rb\", \"setengah\", \"seperempat\").\n\n**Product Matching.** 8 core warung products with 30+ aliases. \"Aqua\" maps to \"Aqua 600ml\". \"Mie goreng\" maps to \"Indomie Goreng\". Longest keyword match prevents false positives.\n\n**Stock Context Parsing.** 5 regex patterns for the same concept: \"habis\", \"tinggal N\", \"sisa N\", \"stok N\", \"kosong\". Because in real Indonesian, there are at least five ways to say \"I have three left\".\n\n**Multi-Action Splitting.** One sentence can contain a debt, a stock update, and a restock order. The parser splits by commas, \"dan\", \"terus\", \"lalu\", \"juga\", then processes each segment independently.\n\n**Debt Tracking.** 4 regex patterns for recording debt, 4 for settling. Each handles a different way Indonesians talk about money: \"bu sari utang 25 ribu\", \"catat utang bu sari 25000\", \"pak budi ngutang 15 ribu\".\n\n**Weekly Insights Engine.** Generates contextual business suggestions based on sales data, like: \"Minggu ini cuaca panas, penjualan minuman naik signifikan (+25%). Fokus restok minuman.\"\n\n**After:**\n\n~2,500 lines of TypeScript across 15 React components. 50+ regex patterns in the NLP engine. 30+ number word mappings. 8 products with 3-4 aliases each. Deployed and live at warung-mimo.vercel.app.\n\nGitHub Copilot did not build the project for me. But it helped me move faster in three specific areas:\n\n**Regex iteration.** When building the debt patterns and stock context parser, I would type a comment like `// match 'bayar utang bu sari'`\n\nand Copilot would suggest the regex pattern. I still verified and adjusted, but it cut iteration time in half.\n\n**Edge case handling.** When testing \"42rb\" vs \"empat puluh dua ribu\" vs \"42.000\", Copilot suggested the fallback digit parser that handles all three formats.\n\n**Template generation.** For the weekly insights engine, I described the goal and Copilot suggested the template system that produces contextual Indonesian business suggestions.\n\nThe biggest help was reducing the friction between \"I know what I want\" and \"I have written the code.\" I still had to think hard about the language logic. Copilot just made the typing part faster.", "url": "https://wpnews.pro/news/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it", "canonical_source": "https://dev.to/iyop666/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-abandoned-it-5bpk", "published_at": "2026-05-27 04:40:24+00:00", "updated_at": "2026-05-27 04:52:51.428883+00:00", "lang": "en", "topics": ["natural-language-processing", "artificial-intelligence", "ai-products", "ai-startups", "ai-tools"], "entities": ["Warung MiMo", "MiMo Orbit 100T Token Grant", "GitHub Finish-Up-A-Thon Challenge", "Vercel", "iyop666"], "alternates": {"html": "https://wpnews.pro/news/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it", "markdown": "https://wpnews.pro/news/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it.md", "text": "https://wpnews.pro/news/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it.txt", "jsonld": "https://wpnews.pro/news/how-i-built-an-indonesian-nlp-parser-that-understands-warung-owners-then-it.jsonld"}}