{"slug": "building-shouldweautomate-a-decision-intelligence-platform-for-workflow", "title": "Building ShouldWeAutomate: A Decision Intelligence Platform for Workflow Automation", "summary": "A developer built ShouldWeAutomate, an open-source decision intelligence platform that evaluates whether business processes are ready for AI automation. The single-page Flask application uses a deterministic scoring engine across seven dimensions—data quality, process stability, regulatory exposure, exception rates, integration readiness, decision complexity, and ROI potential—to produce a weighted score and recommendation tier ranging from \"Do Not Automate\" to \"Agent Automation Ready.\" The platform includes optional LLM inference for workflow analysis, a gamified UX with radar visualizations and SVG gauges, and features such as what-if simulation, ROI calculation, and regulatory framework mapping.", "body_md": "*How we built an open-source platform that tells you whether your business process is ready for AI automation — with deterministic scoring, gamified UX, and optional LLM inference.*\n\nEvery week, someone asks: *\"Can we automate this workflow?\"* The answer is never simple. It depends on data quality, process stability, regulatory exposure, exception rates, integration readiness, decision complexity, and ROI potential — seven dimensions that interact in non-obvious ways.\n\nMost automation decisions are made on gut feel. Teams spend months building automation only to discover the process changes too frequently, the data is too messy, or the compliance team blocks it.\n\nWe wanted to build a tool that makes this evaluation **systematic, data-driven, and interactive** — something a team can open in a browser, describe their workflow, and get a defensible answer in seconds.\n\nSingle-page Flask application rendered server-side with Jinja2 templates. The frontend is vanilla JavaScript with Chart.js for the radar visualization and a custom SVG gauge for the overall score.\n\n**Key design decisions:**\n\n```\n// Core rendering — dimension cards with aggregate + fine-tune\nfunction createDimSection(key, dim, prefix) {\n  const aggDefault = Math.round(\n    dim.questions.reduce((s, q) => s + q.default, 0) / dim.questions.length\n  );\n  const tier = getTier(aggDefault);\n  // ... builds the HTML with aggregate slider + expandable sub-sliders\n}\n\n// Live preview — recompute overall on every slider change\nfunction updateLivePreview() {\n  const weights = [0.20, 0.20, 0.15, 0.15, 0.10, 0.10, 0.10];\n  dimKeys.forEach((key, i) => total += getAggregateValue(key) * weights[i]);\n  // Update gauge SVG dashoffset, tier badge, recommendation text\n}\n```\n\nFlask acts as both the web server and the decision engine. The architecture follows a modular design:\n\n```\nengine/\n├── scorer.py          # Dimension scoring logic, defaults, recommendations\n├── analyzer.py        # Orchestrator — ties all modules together\n├── explainer.py       # Score breakdown with pull-up/pull-down analysis\n├── what_if.py         # What-if simulation and sensitivity analysis\n├── roi_calculator.py  # Quantitative ROI with NPV, payback, FTE impact\n├── remediation.py     # Remediation playbooks per dimension\n├── regulations.py     # Regulatory framework mapping (HIPAA, GDPR, SOX, etc.)\n├── similarity.py      # Benchmark similarity search\n├── sub_process.py     # Multi-process decomposition and aggregation\n└── llm.py             # OpenAI-compatible LLM gateway\n```\n\nThe core scoring logic in `scorer.py`\n\ndefines 7 dimensions, each with 5 weighted sub-questions:\n\n```\nSCORING_DEFAULTS = {\n    \"data_quality\": {\n        \"label\": \"Data Quality\",\n        \"weight\": 0.20,\n        \"questions\": [\n            {\"id\": \"data_completeness\", \"text\": \"How complete is your data?\", ...},\n            {\"id\": \"data_consistency\", \"text\": \"How consistent is data format?\", ...},\n            # ... 5 questions per dimension\n        ],\n    },\n    # ... 6 more dimensions\n}\n```\n\nThe overall score is a weighted average. The recommendation tier is determined by thresholds inspired by Capability Maturity Model (CMM) levels:\n\n``` python\ndef get_recommendation(overall_score):\n    if overall_score < 30:\n        return {\"level\": \"DO NOT AUTOMATE\", ...}\n    elif overall_score < 50:\n        return {\"level\": \"IMPROVE PROCESS FIRST\", ...}\n    elif overall_score < 70:\n        return {\"level\": \"HUMAN-IN-THE-LOOP AI\", ...}\n    elif overall_score < 85:\n        return {\"level\": \"AI ASSISTED AUTOMATION\", ...}\n    else:\n        return {\"level\": \"AGENT AUTOMATION READY\", ...}\n```\n\nThe LLM integration in `engine/llm.py`\n\nis optional and modular. It follows the OpenAI chat completions format, making it compatible with LM Studio, Ollama, OpenAI, Anthropic, or any other provider.\n\nWhen enabled, the AI performs three tasks:\n\n``` python\ndef infer_workflow(description, industry):\n    user_prompt = f\"Industry: {industry}\\n\\nWorkflow Description:\\n{description}\"\n    result = _call_llm(SYSTEM_WORKFLOW_ANALYSIS, user_prompt)\n    if result and \"dimension_scores\" in result:\n        # Clamp scores to 0-100 and return\n        scores = {k: max(0, min(100, int(v))) for k, v in result[\"dimension_scores\"].items()}\n        return result\n    return None\n```\n\nThe system prompt instructs the LLM to be skeptical and default to moderate scores unless the description strongly suggests otherwise — preventing over-optimistic AI outputs.\n\nThe `data/benchmark_generator.py`\n\ncreates 600+ synthetic workflows across 10 industries with deliberately injected failure modes:\n\n```\nFAILURE_PROFILES = {\n    \"contradictory_rules\": \"Business rules are contradictory across departments\",\n    \"broken_apis\": \"Legacy systems have no stable API endpoints\",\n    \"regulatory_churn\": \"Regulations change quarterly, invalidating logic\",\n    \"data_rot\": \"Historical data uses outdated schemas\",\n    \"seasonal_spikes\": \"Volume varies 10x between peak and off-peak\",\n    \"fraud_scenarios\": \"Fraud patterns evolve faster than detection rules\",\n    # ... more failure modes\n}\n```\n\nEach workflow gets randomized dimension scores, a metadata profile, and injected failure modes. The result is a realistic benchmark for similarity matching — when a user analyzes their workflow, we find the 5 most similar synthetic workflows.\n\nThe original UI had 35 range sliders visible at once. Users found it overwhelming. We redesigned it with three principles:\n\n```\nfunction getTier(score) {\n  if (score >= 85) return { text: \"Mythic\", icon: \"🏆\", cls: \"tier-excellent\" };\n  if (score >= 70) return { text: \"Gold\",   icon: \"🥇\", cls: \"tier-good\" };\n  if (score >= 50) return { text: \"Silver\", icon: \"🥈\", cls: \"tier-moderate\" };\n  if (score >= 30) return { text: \"Bronze\", icon: \"🥉\", cls: \"tier-poor\" };\n  return { text: \"Critical\", icon: \"⛔\", cls: \"tier-critical\" };\n}\n```\n\nAI auto-fill is now the default path. Users describe their workflow in a textarea, click \"Auto-fill Scores,\" and the AI pre-fills all 35 sub-scores. Users can then fine-tune before analyzing.\n\nAfter analysis, users get a comprehensive dashboard with seven tabs:\n\n| Tab | Content |\n|---|---|\nOverview |\nGauge, radar chart, risks, red flags, failure mode analysis, ROI, benchmark comparison, next steps |\nExplanation |\nPer-dimension breakdown with pull-up/pull-down factors and improvement tips |\nWhat-If |\nSensitivity analysis + preset scenarios + custom sliders |\nRemediation |\nPhased action plans per dimension with effort estimates |\nRegulatory |\nApplicable regulations with governance penalties and audit requirements |\nAI Summary |\nExecutive summary generated by LLM (when enabled) |\n\n**Deterministic engines are underrated.** The LLM is a nice-to-have, but the deterministic scoring engine handles 90% of use cases. It's fast (~1 second), predictable, and doesn't require users to set up external services.\n\n**Gamification reduces friction.** Users engaged more with tier badges and live preview than with a static form. The instant feedback loop makes the evaluation feel like a game rather than a survey.\n\n**AI prefill is a trust cliff.** When AI prefills scores, users trust it more if they can see and tweak every value. The fine-tune section is critical for building confidence.\n\n**Synthetic benchmarks are surprisingly useful.** Even though they're generated, they provide a reference frame. Users want to know how their scores compare to \"similar\" workflows.\n\n```\ngit clone https://github.com/harishkotra/ShouldWeAutomate.git\ncd ShouldWeAutomate\npip install -r requirements.txt\npython app.py\n```\n\nCode & more: [https://www.dailybuild.xyz/project/148-should-we-automate](https://www.dailybuild.xyz/project/148-should-we-automate)", "url": "https://wpnews.pro/news/building-shouldweautomate-a-decision-intelligence-platform-for-workflow", "canonical_source": "https://dev.to/harishkotra/building-shouldweautomate-a-decision-intelligence-platform-for-workflow-automation-3pn7", "published_at": "2026-05-30 18:12:44+00:00", "updated_at": "2026-05-30 18:42:29.255618+00:00", "lang": "en", "topics": ["ai-products", "ai-tools", "ai-startups", "artificial-intelligence", "machine-learning"], "entities": ["ShouldWeAutomate", "Chart.js", "Flask", "Jinja2"], "alternates": {"html": "https://wpnews.pro/news/building-shouldweautomate-a-decision-intelligence-platform-for-workflow", "markdown": "https://wpnews.pro/news/building-shouldweautomate-a-decision-intelligence-platform-for-workflow.md", "text": "https://wpnews.pro/news/building-shouldweautomate-a-decision-intelligence-platform-for-workflow.txt", "jsonld": "https://wpnews.pro/news/building-shouldweautomate-a-decision-intelligence-platform-for-workflow.jsonld"}}