cd /news/natural-language-processing/how-i-built-an-indonesian-nlp-parser… · home topics natural-language-processing article
[ARTICLE · art-14941] src=dev.to pub= topic=natural-language-processing verified=true sentiment=↑ positive

How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It

A developer built Warung MiMo, an AI-powered assistant for Indonesian street shops that uses natural language processing to understand how shop owners actually speak. The project, which was abandoned after its initial development, was revived and completed with a custom Indonesian NLP parser that handles colloquial number words, product aliases, and multi-action sentences. The final system processes over 50 regex patterns and 30 number word mappings to parse commands like stock updates, debt tracking, and sales logging from natural Indonesian speech.

read3 min publishedMay 27, 2026

This is a submission for the GitHub Finish-Up-A-Thon Challenge

Warung MiMo is an AI-powered assistant for small warungs (Indonesian street shops). It lets shop owners manage inventory, track debts, and log sales using natural Indonesian, either by voice, text, or receipt scanning.

The project started from a simple question: what if a tiny shop owner could talk to software the same way they talk to their helper? Not with menus, not with spreadsheets, just with the way people actually speak in a warung.

I built the first version during the MiMo Orbit 100T Token Grant. The UI was there. The tech stack was solid. But the core engine, the Indonesian NLP parser, was not finished. The project sat abandoned until the GitHub Finish-Up-A-Thon Challenge gave me the push to revive it.

Tech stack: Next.js 16, React 19, shadcn/ui, Tailwind 4, TypeScript. Deployed on Vercel.

**Live:** [https://warung-mimo.vercel.app](https://warung-mimo.vercel.app)

**Source:** [https://github.com/iyop666/warung-mimo](https://github.com/iyop666/warung-mimo)

Try these inputs in the assistant page:

The assistant parses each sentence, identifies products, extracts numbers (even written as Indonesian words), and generates structured actions like stock updates or debt records.

Before:

The original Warung MiMo had a working UI, a product catalog, and a basic input field. But the NLP engine was shallow. It could handle simple commands like "Indomie 5" but failed on real Indonesian sentences like "empat puluh dua ribu" or "setengah dus" or "sisa tiga". The project was stuck in that painful zone where a repo exists but the product does not feel finished enough to trust.

What I changed, fixed, and added:

Indonesian Number Parser. Built from scratch with 30+ number words, compound logic ("empat puluh dua" = 42), and colloquial shortcuts ("42rb", "setengah", "seperempat").

Product Matching. 8 core warung products with 30+ aliases. "Aqua" maps to "Aqua 600ml". "Mie goreng" maps to "Indomie Goreng". Longest keyword match prevents false positives.

Stock Context Parsing. 5 regex patterns for the same concept: "habis", "tinggal N", "sisa N", "stok N", "kosong". Because in real Indonesian, there are at least five ways to say "I have three left".

Multi-Action Splitting. One sentence can contain a debt, a stock update, and a restock order. The parser splits by commas, "dan", "terus", "lalu", "juga", then processes each segment independently.

Debt Tracking. 4 regex patterns for recording debt, 4 for settling. Each handles a different way Indonesians talk about money: "bu sari utang 25 ribu", "catat utang bu sari 25000", "pak budi ngutang 15 ribu".

Weekly Insights Engine. Generates contextual business suggestions based on sales data, like: "Minggu ini cuaca panas, penjualan minuman naik signifikan (+25%). Fokus restok minuman."

After:

~2,500 lines of TypeScript across 15 React components. 50+ regex patterns in the NLP engine. 30+ number word mappings. 8 products with 3-4 aliases each. Deployed and live at warung-mimo.vercel.app.

GitHub Copilot did not build the project for me. But it helped me move faster in three specific areas:

Regex iteration. When building the debt patterns and stock context parser, I would type a comment like // match 'bayar utang bu sari'

and Copilot would suggest the regex pattern. I still verified and adjusted, but it cut iteration time in half.

Edge case handling. When testing "42rb" vs "empat puluh dua ribu" vs "42.000", Copilot suggested the fallback digit parser that handles all three formats.

Template generation. For the weekly insights engine, I described the goal and Copilot suggested the template system that produces contextual Indonesian business suggestions.

The biggest help was reducing the friction between "I know what I want" and "I have written the code." I still had to think hard about the language logic. Copilot just made the typing part faster.

── more in #natural-language-processing 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-i-built-an-indon…] indexed:0 read:3min 2026-05-27 ·