{"slug": "how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions", "title": "How to Build a Document Intelligence Backend with iii Using Workers, Functions, and Cron Triggers", "summary": "A new tutorial demonstrates how to build a document intelligence backend using the iii framework, combining Workers, Functions, and Cron Triggers for text analysis workflows. The guide walks through installing the iii engine, connecting Python workers, and registering functions for text normalization, tokenization, sentiment analysis, keyword extraction, reporting, and heartbeat tracking. Developers can then execute the same logic through direct invocation, HTTP endpoints, fire-and-forget execution, or scheduled cron triggers, creating a production-like backend system rather than a static notebook demo.", "body_md": "In this tutorial, we build a document-intelligence workflow with[ iii](https://github.com/iii-hq/iii). We begin by installing the iii engine and Python SDK, then start the engine as a background process and connect a Python worker to it. After the setup, we register separate functions for text normalization, tokenization, sentiment analysis, keyword extraction, reporting, and heartbeat tracking. We then combine these functions into a single analysis pipeline and run the same logic via direct invocation, an HTTP endpoint, fire-and-forget execution, and a scheduled cron trigger. Along the way, we also track basic runtime state, making the workflow feel closer to a real backend system than a static notebook demo. Check out the\n\n[FULL CODES here](https://github.com/MARKTECHPOST-AI-MEDIA-INC/AI-Agents-Projects-Tutorials/blob/main/Distributed%20Systems/iii_live_document_intelligence_backend_marktechpost.py).\n\n``` python\nimport os, sys, subprocess, time, socket, json, threading\nfrom collections import Counter\nHOME    = os.path.expanduser(\"~\")\nBIN_DIR = f\"{HOME}/.local/bin\"\nos.environ[\"PATH\"] = BIN_DIR + os.pathsep + os.environ.get(\"PATH\", \"\")\ndef sh(cmd):\n   print(f\"$ {cmd}\")\n   subprocess.run(cmd, shell=True, check=True)\nif not os.path.exists(f\"{BIN_DIR}/iii\"):\n   sh(f\"curl -fsSL https://install.iii.dev/iii/main/install.sh | BIN_DIR={BIN_DIR} sh\")\nsh(f\"{sys.executable} -m pip install -q iii-sdk requests\")\nIII = f\"{BIN_DIR}/iii\"\nsh(f\"{III} --version\")\n```\n\nWe start by importing the required Python modules and setting up the local binary path for the III engine. We define a small helper function to run shell commands and install the III engine if it is not already available. We also install the Python SDK and requests package, then verify the iii installation by checking its version.\n\n```\nWS_URL, HTTP_URL = \"ws://localhost:49134\", \"http://localhost:3111\"\nengine_log = open(\"/tmp/iii-engine.log\", \"w\")\nengine = subprocess.Popen([III, \"--use-default-config\"],\n                         stdout=engine_log, stderr=subprocess.STDOUT)\ndef wait_port(host, port, timeout=90):\n   end = time.time() + timeout\n   while time.time() < end:\n       with socket.socket() as s:\n           s.settimeout(1)\n           try:\n               s.connect((host, port)); return True\n           except OSError:\n               time.sleep(0.5)\n   return False\nassert wait_port(\"localhost\", 49134), \"engine never came up — see /tmp/iii-engine.log\"\nprint(f\"✓ engine up — WS {WS_URL} | HTTP {HTTP_URL}\")\nfrom iii import register_worker\ntry:\n   from iii import TriggerAction\nexcept Exception:\n   TriggerAction = None\nworker = register_worker(WS_URL)\n_STATE = {\"docs_analyzed\": 0, \"heartbeats\": 0, \"keyword_totals\": Counter()}\n_LOCK  = threading.Lock()\nPOSITIVE = {\"good\",\"great\",\"love\",\"excellent\",\"happy\",\"fast\",\"reliable\",\"amazing\",\"best\",\"win\"}\nNEGATIVE = {\"bad\",\"terrible\",\"hate\",\"slow\",\"broken\",\"sad\",\"worst\",\"bug\",\"crash\",\"fail\"}\n```\n\nWe launch the iii engine as a background process and wait for its WebSocket port to become available. We then connect a Python worker to the running engine and prepare optional support for fire-and-forget triggers. We also define a shared in-memory state, a thread lock, and simple positive and negative word sets for sentiment analysis.\n\n``` python\ndef normalize(data):\n   return {\"text\": (data.get(\"text\") or \"\").strip().lower()}\ndef tokenize(data):\n   text   = data.get(\"text\", \"\")\n   cleaned = \"\".join(c if (c.isalnum() or c.isspace()) else \" \" for c in text)\n   tokens = [t for t in cleaned.split() if t]\n   return {\"tokens\": tokens, \"count\": len(tokens)}\ndef sentiment(data):\n   toks  = data.get(\"tokens\", [])\n   pos   = sum(t in POSITIVE for t in toks)\n   neg   = sum(t in NEGATIVE for t in toks)\n   score = pos - neg\n   label = \"positive\" if score > 0 else \"negative\" if score < 0 else \"neutral\"\n   return {\"label\": label, \"score\": score, \"pos\": pos, \"neg\": neg}\ndef keywords(data):\n   toks = data.get(\"tokens\", [])\n   stop = {\"the\",\"a\",\"an\",\"is\",\"it\",\"to\",\"of\",\"and\",\"in\",\"for\",\"on\",\"how\"}\n   freq = Counter(t for t in toks if t not in stop and len(t) > 2)\n   return {\"keywords\": freq.most_common(data.get(\"top_n\", 5))}\ndef analyze(data):\n   norm = worker.trigger({\"function_id\": \"text::normalize\", \"payload\": {\"text\": data.get(\"text\",\"\")}})\n   toks = worker.trigger({\"function_id\": \"text::tokenize\",  \"payload\": norm})\n   sent = worker.trigger({\"function_id\": \"text::sentiment\", \"payload\": toks})\n   keys = worker.trigger({\"function_id\": \"text::keywords\",  \"payload\": {**toks, \"top_n\": data.get(\"top_n\", 5)}})\n   with _LOCK:\n       _STATE[\"docs_analyzed\"] += 1\n       for k, c in keys[\"keywords\"]:\n           _STATE[\"keyword_totals\"][k] += c\n       n = _STATE[\"docs_analyzed\"]\n   return {\"tokens\": toks[\"count\"], \"sentiment\": sent, \"keywords\": keys[\"keywords\"], \"docs_analyzed\": n}\ndef report(data):\n   with _LOCK:\n       return {\"docs_analyzed\": _STATE[\"docs_analyzed\"],\n               \"heartbeats\":    _STATE[\"heartbeats\"],\n               \"top_keywords_all_docs\": _STATE[\"keyword_totals\"].most_common(5)}\ndef http_analyze(data):\n   body   = data.get(\"body\") or {}\n   result = worker.trigger({\"function_id\": \"pipeline::analyze\", \"payload\": body})\n   return {\"status_code\": 200, \"body\": result, \"headers\": {\"Content-Type\": \"application/json\"}}\ndef heartbeat(data):\n   with _LOCK:\n       _STATE[\"heartbeats\"] += 1\n   return {\"ok\": True}\nfor fid, fn in [\n   (\"text::normalize\", normalize), (\"text::tokenize\", tokenize),\n   (\"text::sentiment\", sentiment), (\"text::keywords\", keywords),\n   (\"pipeline::analyze\", analyze), (\"stats::report\", report),\n   (\"http::analyze\", http_analyze), (\"cron::heartbeat\", heartbeat),\n]:\n   worker.register_function(fid, fn)\n```\n\nWe define the core functions used in the text-analysis workflow, including normalization, tokenization, sentiment detection, and keyword extraction. We then create an analysis function that routes each step through the III engine instead of calling everything directly. We also add reporting, HTTP handling, and heartbeat functions before registering all of them with the worker.\n\n```\nworker.register_trigger({\"type\": \"http\", \"function_id\": \"http::analyze\",\n                        \"config\": {\"api_path\": \"/analyze\", \"http_method\": \"POST\"}})\ncron_ok = False\ntry:\n   worker.register_trigger({\"type\": \"cron\", \"function_id\": \"cron::heartbeat\",\n                            \"config\": {\"schedule\": \"*/2 * * * * *\"}})\n   cron_ok = True\nexcept Exception as e:\n   print(\"cron trigger skipped:\", e)\ntry:\n   worker.connect()\nexcept Exception:\n   pass\ntime.sleep(2)\n```\n\nWe register an HTTP trigger so that the analysis pipeline can be invoked via a POST request. We also try to register a cron trigger that runs the heartbeat function on a fixed schedule, while safely skipping it if the engine build does not support that schema. We then connect the worker and pause briefly so the registered functions and triggers are ready to use.\n\n```\nprint(\"\\n=== A) Direct invocation — orchestrated through the engine ===\")\ndocs = [\n   \"iii makes the backend amazing and fast, I love how reliable it is\",\n   \"The legacy gateway was slow and broken, a terrible buggy experience\",\n   \"Workers register functions and triggers; the engine routes every call\",\n]\nfor d in docs:\n   r = worker.trigger({\"function_id\": \"pipeline::analyze\", \"payload\": {\"text\": d, \"top_n\": 4}})\n   print(f\"  [{r['sentiment']['label']:>8}] tokens={r['tokens']:>2}  keywords={r['keywords']}\")\nprint(\"\\n=== B) The SAME function over HTTP (:3111) — zero handler changes ===\")\nimport requests\ntry:\n   resp = requests.post(f\"{HTTP_URL}/analyze\",\n                        json={\"text\": \"great great product, best ever\", \"top_n\": 3}, timeout=10)\n   print(\"  HTTP\", resp.status_code, \"->\", resp.json())\nexcept Exception as e:\n   print(\"  HTTP call failed (engine HTTP module/version?):\", e)\nprint(\"\\n=== C) Fire-and-forget invocation ===\")\nif TriggerAction:\n   worker.trigger({\"function_id\": \"pipeline::analyze\",\n                   \"payload\": {\"text\": \"async win, no waiting\"},\n                   \"action\": TriggerAction.Void()})\n   print(\"  dispatched (no result awaited)\")\nelse:\n   print(\"  TriggerAction not in this SDK build — skipping\")\nprint(\"\\n=== D) Cron trigger firing on its own ===\")\nif cron_ok:\n   time.sleep(5)\n   print(\"  heartbeats so far:\",\n         worker.trigger({\"function_id\": \"stats::report\", \"payload\": {}})[\"heartbeats\"])\nelse:\n   print(\"  cron not registered on this engine build\")\nprint(\"\\n=== E) Aggregate state report ===\")\nprint(json.dumps(worker.trigger({\"function_id\": \"stats::report\", \"payload\": {}}), indent=2))\nprint(\"\\nTraces/metrics: run `iii console` locally, or scrape Prometheus at :9464\")\nprint(\"engine log tail:\")\nprint(subprocess.run([\"tail\", \"-n\", \"8\", \"/tmp/iii-engine.log\"],\n                    capture_output=True, text=True).stdout)\n```\n\nWe test the complete III workflow by sending sample text documents through the registered analysis pipeline. We then call the same logic through HTTP, try fire-and-forget execution, and check whether the cron heartbeat is running. Finally, we print the aggregate state report and show the engine log tail for basic runtime visibility.\n\nIn conclusion, we have a working III system that processes text using modular, registered functions rather than a single fixed script. We analyzed sample documents, exposed the pipeline through HTTP, tested async-style execution, tracked heartbeat activity, and printed an aggregate state report. The tutorial keeps the example readable while showing the main working pattern of iii: define functions once, register them with a worker, and reuse them through different triggers and execution paths. It also shows how small functions can be cleanly connected as the workflow grows into something more production-ready.\n\nCheck out the ** FULL CODES here. **Also, feel free to follow us on\n\n**and don’t forget to join our**[Twitter](https://x.com/intent/follow?screen_name=marktechpost)\n\n**and Subscribe to**\n\n[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)**. Wait! are you on telegram?**\n\n[our Newsletter](https://www.aidevsignals.com/)\n\n[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? [Connect with us](https://forms.gle/wbash1wF6efRj8G58)\n\nSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.\n\n- Sana Hassan\n- Sana Hassan\n- Sana Hassan\n- Sana Hassan", "url": "https://wpnews.pro/news/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions", "canonical_source": "https://www.marktechpost.com/2026/06/03/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions-and-cron-triggers/", "published_at": "2026-06-03 19:15:07+00:00", "updated_at": "2026-06-03 20:08:03.914493+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "ai-agents", "mlops", "natural-language-processing"], "entities": ["iii", "Marktechpost", "Python", "Workers", "Functions", "Cron Triggers", "HTTP", "SDK"], "alternates": {"html": "https://wpnews.pro/news/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions", "markdown": "https://wpnews.pro/news/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions.md", "text": "https://wpnews.pro/news/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions.txt", "jsonld": "https://wpnews.pro/news/how-to-build-a-document-intelligence-backend-with-iii-using-workers-functions.jsonld"}}