{"slug": "system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness", "title": "System Boundaries: The Difference Between ChatBot, Workflow, Agent, and Harness", "summary": "A developer argues that ChatBot, Workflow, Agent, and Harness are not an upgrade path but distinct engineering choices based on uncertainty levels. ChatBot solves conversation problems, Workflow handles deterministic processes, Agent manages dynamic decisions, and Harness provides stable hosting for production environments. The key insight is that Agent should only be introduced when task paths cannot be predetermined and require runtime judgment.", "body_md": "When people first build Agent systems, they often naturally read them as an upgrade path:\n\n``` php\nChatBot is too simple\n-> Workflow is more engineered\n-> Agent is smarter\n-> Harness is more advanced\n```\n\nThis line is smooth, but it misleads engineering judgment.\n\n`Agent`\n\nis not the more advanced default choice.\n\nIn real projects, many problems are fully solved by ChatBot; many automations are more reliable as Workflow; only when the task path cannot be written in advance and the system must judge the next step from new evidence at runtime is Agent worth introducing. As for Harness, it is not \"one more cool architecture wrapper\"; it is the model-external control system that becomes necessary when an Agent enters a real environment and must be stably hosted, permissioned, stateful, logged, recoverable, verifiable, and governed.\n\nWe keep using the example from the first two articles:\n\n```\nHelp me figure out why this project's tests are failing, and fix it.\n```\n\nThis sentence looks like an Agent task, but it can be split into four completely different system forms.\n\nIf the user only asks:\n\n```\nWhat usually causes Jest to report Cannot find module?\n```\n\nthat may be ChatBot Q&A.\n\nIf the team has already fixed the diagnosis steps:\n\n``` php\npull code -> install dependencies -> run tests -> collect logs -> send to Slack\n```\n\nthat is more like Workflow.\n\nIf the system does not know where the failure comes from and must decide which file to read, which command to run, where to edit, and how to verify, it enters Agent territory.\n\nIf that Agent must run for the team every day, recover interruptions, limit permissions, record audits, measure success rate, and replay failed sessions, it starts to need Harness.\n\nSo this article does not ask \"which concept is more advanced.\" It asks:\n\nWhen should you use ChatBot, when should you use Workflow, when do you need Agent, and when must you build Harness?\n\nHere is the guiding sentence:\n\n**ChatBot solves conversation problems, Workflow solves deterministic process problems, Agent solves dynamic decision problems, and Harness solves stable hosting problems.**\n\nThese four are not luxury versions on one line. They are engineering choices for different uncertainty levels and risk boundaries. In real products they can also coexist: one system may have a ChatBot entry, Workflow pipelines, Agent loops, and a Harness that gradually thickens.\n\nThe problem sequence:\n\n``` php\nThe user only needs understanding and expression\n-> ChatBot message management is enough\n-> The task steps are already determined and only need execution\n-> Workflow is more reliable, cheaper, and easier to test\n-> The task path is uncertain and the next step must be judged at runtime\n-> Introduce Agent so the model participates in dynamic decisions\n-> Once Agent enters a real environment, tools, permissions, state, side effects, and verification appear\n-> Model-proposed action cannot directly equal system execution\n-> Harness is needed to carry model-external engineering control\n```\n\nThe most important judgment is:\n\n```\nAgent is not the more advanced default option.\nAgent is complexity paid for uncertain tasks.\n```\n\nIf there is no uncertainty, Agent often adds unnecessary variance. If a task only needs conversation but is built as Agent, the system becomes harder to test, control, and explain. If a process is fully determined but Agent is asked to re-judge every step, you trade reliability that could be encoded for unstable model output.\n\nFirst, anchor the four boundaries:\n\nThe key in this diagram is not the four names, but the central question: `Where is the uncertainty?`\n\nIf uncertainty is only in \"how the user asks and how the model answers,\" ChatBot is enough. If uncertainty has already been digested by humans into a flowchart, Workflow is better. If uncertainty must be handled from runtime evidence, Agent has value. If Agent is no longer a one-off script but must run stably, recover, audit, and verify, Harness becomes unavoidable.\n\nHarness is not exactly the fourth, more advanced peer of ChatBot, Workflow, and Agent. More accurately, ChatBot, Workflow, and Agent are task-processing forms; Harness is the engineering system that provides execution boundaries and a control plane when Agent's dynamic process enters a real environment.\n\nStart with ChatBot.\n\nChatBot is the easiest layer to underestimate. Many developers hear ChatBot and think \"just chatting,\" not engineered enough. But if the user's real need is understanding, explanation, summarization, rewriting, or Q&A, ChatBot is the lightest, most stable, and cheapest system form.\n\nFor example:\n\n```\nWhat does this Cannot find module error mean?\n```\n\nor:\n\n```\nHelp me explain the scripts field in package.json.\n```\n\nThe core is not \"execute an action\"; it is \"make existing information clear.\" The system usually does:\n\n``` php\nreceive user input\n-> organize message history\n-> call model\n-> return natural language answer\n```\n\nEven with product features such as session management, source citations, formatted output, and user preferences, it has not entered \"autonomous task execution.\"\n\nA minimal ChatBot:\n\n```\ntype ChatMessage = {\n  role: \"system\" | \"user\" | \"assistant\";\n  content: string;\n};\n\nasync function chat(messages: ChatMessage[]) {\n  return model.generate({\n    messages,\n    temperature: 0.3,\n  });\n}\n```\n\nThere is no loop, tool runtime, state machine, or permission system here. That is not a flaw; it is a clear boundary.\n\nChatBot engineering focuses on:\n\nIf these solve the problem, do not rush to upgrade it into an Agent.\n\nOnce Agent is introduced, many costs appear: the model may choose the wrong tool, tools may fail, state may pollute the next judgment, permissions must be governed, results must be verified, and every step needs retry and replay thinking.\n\nBack to the CLI assistant:\n\n```\nHow should I usually debug this test error?\n```\n\nChatBot can provide a debugging approach, even explain pasted logs. But it should not pretend it inspected the project or say \"I fixed it.\"\n\nIt did not read files, modify code, or run tests.\n\nThe honest boundary of ChatBot is:\n\n```\nI can reason from the information you gave me.\nBut I have not actively observed the external environment.\n```\n\nMany unreliable \"looks like Agent\" systems cross this boundary. They have no tools and no runtime state, but language makes them sound as if they have acted. Users think the system is doing work; the system is generating descriptions.\n\nThe first engineering discipline of ChatBot is:\n\n**Do not treat generated action descriptions as actions that have happened.**\n\nIf a system can only chat, make it a good chat system. If it must act, it enters the next boundary.\n\nWorkflow is not about \"whether the model can think.\" It is about \"humans already know the process; can the system execute it reliably?\"\n\nFor example, the team needs a daily CI failure summary:\n\n``` php\npull latest code\n-> install dependencies\n-> run tests\n-> collect failure logs\n-> generate report\n-> send to Slack\n```\n\nThis chain does not need the model to re-judge every step. It needs stable step order, retry on failure, timeout interruption, per-step logs, traceable results, and similar outputs from similar inputs.\n\nThat is Workflow territory.\n\nIn this CLI Agent tutorial, suppose we do not yet do \"automatic test repair\" and only build a fixed diagnosis flow:\n\n```\n1. run npm test\n2. if it fails, save logs\n3. search failed test name\n4. print related file paths\n5. let the model summarize possible causes from logs\n```\n\nThis can be a Workflow. The model only participates in the final summary, not the process decision.\n\nPseudocode:\n\n```\nasync function diagnoseTestFailureWorkflow(repo: Repo) {\n  await repo.install();\n\n  const testResult = await repo.run(\"npm test\");\n\n  if (testResult.ok) {\n    return { status: \"passed\" };\n  }\n\n  const symbols = extractFailedSymbols(testResult.output);\n  const files = await repo.search(symbols);\n\n  const summary = await model.generate({\n    messages: [\n      system(\"You are a test failure analysis assistant.\"),\n      user(renderFailureContext(testResult.output, files)),\n    ],\n  });\n\n  return {\n    status: \"failed\",\n    log: testResult.output,\n    relatedFiles: files,\n    summary,\n  };\n}\n```\n\nThis is not Agent.\n\nThe model does not decide what happens next. The process author wrote the path into the program; the model is only a capability inside one node.\n\nForcing this into Agent form can make it worse:\n\n```\nShould we run tests next?\nShould we search files?\nShould we generate a report?\nShould we notify Slack?\n```\n\nThese decisions do not need dynamic judgment. Encoding them as Workflow is clearer, cheaper, and more testable.\n\nWorkflow's advantage comes precisely from \"giving the model less freedom.\" In deterministic processes, freedom is usually not capability; it is risk. If something should be done the same way every time, write it as process. Do not ask the model to improvise each time.\n\nTypical Workflow forms include:\n\nThese systems may call an LLM, but that does not make them Agents. A \"Workflow with LLM nodes\" is still Workflow.\n\nThe key question:\n\n```\nWho decides the next step?\n```\n\nIf the next step is decided by a flowchart, it is Workflow. If it is decided dynamically by the model from the current task state, it starts approaching Agent.\n\nDiagram:\n\n`LLM summarizes cause`\n\nis only a node. It does not take over control. Workflow controls progress.\n\nThe first boundary between Workflow and Agent is:\n\n```\nLLM appearing in a process does not turn the process into Agent.\nOnly when the LLM decides the next process step does the system enter Agent boundary.\n```\n\nIf Workflow is missing, teams often hand fixed automation to Agent. The result: a path that should be deterministic changes every run; a task that should fail fast wanders around; something testable by unit tests becomes something humans must watch; fixed permissions become an open action space.\n\nThese systems demo as \"intelligent\" and maintain like \"uncontrolled scripts.\"\n\nWorkflow's first engineering discipline:\n\n**If the process can be determined, write it as process before giving it to Agent judgment.**\n\nNow we enter Agent.\n\nAgent appears not because \"we want the system to be more human,\" but because some tasks cannot be written as a fixed process.\n\nThe task:\n\n```\nHelp me figure out why this project's tests are failing, and fix it.\n```\n\nhas many possible paths: dependency version mismatch, missing environment variable, outdated snapshot, type definition error, business logic regression, bad mock configuration, build script change, filesystem case mismatch, and more.\n\nYou can write a giant Workflow covering every case, but it quickly becomes an unmaintainable decision tree. Each new error needs a new branch, and each branch must know which file to read, which command to run, and how to judge the result.\n\nThis is where Agent has value:\n\n```\nLet the model choose the next action each round from the current task state.\n```\n\nThe minimal Agent runtime:\n\n``` php\nlook at current state\n-> judge next step\n-> propose tool intent\n-> system executes under control\n-> write result back into state\n-> continue judging\n```\n\nThe key shift:\n\n```\nPart of control moves from a fixed process to the model.\n```\n\nOnly part.\n\nThe model should not own real-world control directly. It only proposes action intent, or tool intent / action proposal. The outer runtime decides whether and how that intent executes, and how execution is recorded.\n\nMinimal Agent loop:\n\n```\nasync function runAgent(task: string, state: AgentState) {\n  while (!state.done) {\n    const modelOutput = await model.generate({\n      messages: buildContext(task, state),\n      tools: toolRegistry.schemas(),\n    });\n\n    if (modelOutput.type === \"final\") {\n      state.done = true;\n      return modelOutput.content;\n    }\n\n    const intent = parseToolIntent(modelOutput);\n    const observation = await toolRuntime.handle(intent, state);\n\n    state.events.push({\n      type: \"tool_observation\",\n      intent,\n      observation,\n    });\n  }\n}\n```\n\nThis pseudocode has fewer fixed steps than Workflow. It does not hard-code that step 1 must run tests, step 2 must search files, step 3 must read files, step 4 must edit. It only fixes the runtime boundary:\n\n```\nthe model may judge next step\nbut the next step must be expressed as intent through the tool protocol\nintent must be executed by the system under control\nexecution result must be written back into state\nthe loop must be able to stop\n```\n\nThat is the engineering core of Agent.\n\nAgent is not \"the model doing whatever it wants.\"\n\nAgent is \"the model choosing the next step inside a controlled loop.\"\n\nSequence diagram:\n\nTwo boundaries matter.\n\nThe first is between `Model -> Agent Runtime`\n\n. The model outputs intent, not an action that already happened.\n\nThe second is between `Agent Runtime -> Tool Runtime`\n\n. Runtime must validate, execute, and record tool intent, not pass model text directly to the system.\n\nWithout these boundaries, Agent degenerates into:\n\n``` php\nmodel outputs shell\n-> program executes directly\n-> failures and side effects cannot be governed\n```\n\nThat is not mature Agent; it is the common dangerous shortcut of demos.\n\nAgent's benefits are clear: it can face open problems, adjust its path from new evidence, string search, read, modify, and verify into a dynamic process, and build situational understanding in unknown projects.\n\nAgent's costs are equally clear: output instability, non-fixed paths, harder testing, more complex state, necessary permission governance, harder error recovery, and evaluation that is no longer just asserting one function return value.\n\nAgent's first engineering discipline:\n\n**Only let the model participate in next-step decisions when the task path must be decided at runtime.**\n\nAgent is not a default upgrade.\n\n```\nAgent is uncertainty budget.\n```\n\nIntroducing Agent means admitting part of the path cannot be determined in advance. You must also pay the engineering cost for that freedom. If you do not, Agent becomes \"unstable behavior every time,\" not \"dynamic problem solving.\"\n\nAgent is dynamic decision-making. But a system that can dynamically decide has only just entered the real world.\n\nThe real world keeps asking:\n\n```\nDoes this tool intent have permission?\nWhere does this step execute?\nCan this task recover after interruption?\nWhy did the model choose this step?\nWhich files changed?\nDid tests really run?\nCan failed sessions be replayed?\nCan outputs from different models be compared?\nWhere is the user approval record?\nHow is success rate observed in production?\n```\n\nThese questions belong neither to the model nor to a single tool. They belong to the engineering control system outside Agent.\n\nThis is Harness.\n\nHarness is not another Agent.\n\nHarness is not a larger prompt.\n\nHarness is the control plane outside the model. It hosts Agent in a controllable environment.\n\nThe CLI Agent may start with a single `runAgent()`\n\n. Once a team uses it, these parts grow:\n\nThese are not architectural neatness. They are survival conditions when Agent enters a real engineering environment.\n\nRemember Harness through ETCLOVG:\n\n```\nExecution: where do commands run? sandbox, timeout, cwd, resource limits?\nTools: how are tools described, discovered, called, and returned as observation?\nContext: what should the model see this turn?\nLifecycle: how does a task start, interrupt, resume, and end?\nObservability: does every step have event log, trace, replayable evidence?\nVerification: how does the system know the task is truly complete?\nGovernance: who controls permission, approval, audit, and safety boundary?\n```\n\nThese are model-external engineering responsibilities.\n\nA Harness layer can be simplified as:\n\nThe model is only one node. The surrounding control capabilities make the system usable, especially `Policy`\n\n, `Event Log`\n\n, and `Verification`\n\n.\n\nWithout permission governance, Agent may take high-risk actions. Without event log, failures cannot be explained or replayed. Without verification, the system can only trust the model saying \"I fixed it.\"\n\nBoundary:\n\n```\nAgent dynamically chooses the next task step.\nHarness makes that dynamic process executable, auditable, recoverable, verifiable, and governable.\n```\n\nMore sharply:\n\n```\nAgent judges the next task step.\nHarness judges whether that step can land, where it lands, how it is recorded, and how it is verified.\n```\n\nA toy local CLI may not need a complete Harness immediately. But if any of these appear, start building Harness:\n\nHarness is not \"making Agent heavier.\" Its goal is to put uncertainty into governable boundaries.\n\nAgent brings freedom.\n\nHarness adds guardrails, dashboards, and black-box recorders around that freedom, without turning the freedom itself into determinism.\n\nHarness does not magically make Agent reliable. More accurately, Harness lowers the complexity of connecting Agent to real engineering flows; reliability still depends on task design, tool boundaries, context policy, permission control, failure recovery, and evaluation.\n\nWithout Harness, many Agent failures are not \"the model is not smart enough,\" but missing intermediate state, missing tool output, no read/write distinction, verification outside the loop, context too long, no recovery point, no permission approval, no eval baseline.\n\nThese are not solved completely by a stronger model because they happen outside the model.\n\nHarness's first engineering discipline:\n\n**Do not expect the model to carry responsibilities that belong to the runtime system.**\n\nThe model judges. Harness hosts the environment where judgment happens.\n\nComparison table:\n\n| System / Control Form | Core Question | Who Decides Next Step | External Environment | Main Risk | Good For |\n|---|---|---|---|---|---|\n| ChatBot | How to answer and explain | user-model conversation | usually no active contact | hallucination, context misunderstanding | Q&A, summary, explanation, rewrite |\n| Workflow | How to execute a deterministic process stably | predefined process | can contact, but path is fixed | missing branches, external system failure | CI, approval, reporting, sync |\n| Agent | How to progress in uncertain tasks | model dynamic judgment | controlled through tools | unstable path, complex permission and state | debugging, code modification, research, open tasks |\n| Harness | How to host Agent stably | Agent judges task step; Harness controls execution boundary | controlled contact | missing audit, recovery, evaluation, governance | team-level Agent, automation, long tasks |\n\nThe table is about engineering selection, not definitions to memorize.\n\nThe key is not \"does it have an LLM,\" but \"how much decision freedom and engineering control does it need?\"\n\nChatBot's freedom is in conversation.\n\nWorkflow's freedom is narrowed by process.\n\nAgent gives part of next-step selection to the model.\n\nHarness places Agent freedom inside a larger control system.\n\nSo do not ask:\n\n```\nDoes it look intelligent?\n```\n\nAsk:\n\n```\nWho decides the next step?\nWho executes external actions?\nWho records state?\nWho carries risk?\nWho explains failure?\nWho verifies completion?\n```\n\nThese questions are more reliable than concept labels.\n\nA system may have Agent but weak Harness. These systems usually demo well and run painfully.\n\nA system may have strong Workflow and strong Harness but almost no Agent. A highly standardized CI platform does not need the model to decide the next step, but needs strong scheduling, logs, permission, and recovery.\n\nThe four are boundaries, not ranks.\n\nWhen the boundary is chosen wrong, all later technical choices bend out of shape.\n\nAnother subtle misunderstanding: using many \"Agentic Design Patterns\" does not make a system an Agent.\n\nPrompt Chaining can look like Agent. It breaks a large task into multiple steps:\n\n``` php\nuser input\n-> step 1 extracts structured information\n-> step 2 adds context\n-> step 3 generates answer\n-> step 4 checks format\n```\n\nThere are multiple model calls and intermediate state, so it looks \"intelligent.\" But if each step is prewritten by the program and the model is only the executor inside each node, it is still closer to Workflow. Key control lies in the flowchart, not the model.\n\nRouting is similar. The system may first ask the model which category the user request belongs to:\n\n``` php\nbug diagnosis -> diagnostic flow\ndocument summary -> summary flow\ncode explanation -> explanation flow\nsmall talk -> ChatBot\n```\n\nThis is more flexible than one path. But if candidate paths are predefined and the model only classifies, it is still a dynamic branch in Workflow. The Agent boundary is entered when the model keeps choosing the next step from new observations during execution, and the next-step set is not a fixed small menu but must be determined from the goal, tool results, state, and budget.\n\nParallelization is also not Agent. Calling many models, tools, or analyzers in parallel only says the execution structure is fan-out / fan-in. The key remains:\n\n```\nWho decides what to parallelize?\nWho decides how results merge?\nWho decides the next step after failure?\nWho saves state and evidence?\n```\n\nIf these are decided by program flow, it is a complex Workflow. If the model continues rewriting tasks, choosing tools, and updating plans from observations, it begins entering Agent.\n\nThe professional boundary question is not:\n\n```\nDoes it have LLM?\nDoes it have multiple steps?\nDoes it have tools?\nDoes it have parallelism?\n```\n\nIt is:\n\n```\nWhere is next-step control?\nWho constrains external side effects?\nWho saves factual state?\nWho judges completion evidence?\n```\n\nThat is why some systems look like Agents but are Workflows when you read the code; and some systems look like CLI tools, but because the model owns runtime next-step choice, they already need Agent Runtime control mechanisms.\n\nOnce Workflow and Agent are separated, a practical design principle appears:\n\n**Agent is uncertainty budget.**\n\nYou do not use Agent because it is \"more advanced.\" You use it because the task truly has uncertainty that cannot be written into a process in advance.\n\nIn \"fix failing tests,\" if the failure type is always fixed, such as only checking whether `moduleNameMapper`\n\nlacks a path alias, Workflow is enough:\n\n```\nrun tests\nmatch Cannot find module\nread tsconfig\nread test config\ngenerate fix suggestion\n```\n\nBut if failures may come from dependency versions, async races, database state, mock config, time boundaries, platform differences, test order pollution, cache issues, or inconsistent type output, the fixed process quickly expands into an endless troubleshooting manual.\n\nThat is where Agent is useful. It does not replace all process; it handles judgments that cannot be enumerated in advance:\n\n```\nWhich file should be read next?\nWhich signal matters in this log?\nShould the hypothesis be verified first, or should code be changed first?\nAfter the fix fails, should we roll back or change direction?\nIs this failure caused by model judgment or incomplete tool result?\n```\n\nUncertainty is not free. Giving next-step choice to the model costs:\n\n```\nstate cost: record why it did this\npermission cost: limit what it can do\nverification cost: do not trust it saying done\nobservability cost: know where failure happened\nevaluation cost: know whether prompt, tool, or model changes regressed\n```\n\nGood Agent design does not \"give the model more freedom.\" It leaves only necessary uncertainty to the model and pulls deterministic parts back into Workflow, Tool Runtime, Policy, and Verification.\n\nIn one sentence:\n\n```\nWorkflow digests uncertainty into process in advance.\nAgent leaves part of uncertainty for runtime handling.\nHarness puts runtime uncertainty inside governable boundaries.\n```\n\nUse the same \"fix failing tests\" scenario in four engineering forms.\n\nUser pastes a log:\n\n```\nFAIL src/user.test.ts\nCannot find module '@/lib/db'\n```\n\nIf the system answers \"this usually means the test runner does not recognize path aliases; check `tsconfig paths`\n\n, `jest moduleNameMapper`\n\n, or `vitest alias`\n\n,\" it is ChatBot. Its input is user-provided text; its output is explanation and advice; side effects are zero. It is useful, but cannot say \"I fixed it.\"\n\nIf the system runs a fixed process:\n\n``` php\nnpm test\n-> save failure log\n-> parse failed filename\n-> grep moduleNameMapper\n-> give logs and search results to model for summary\n```\n\nit is Workflow. It really touches the project, but next step is decided by the process. When the failure is within the process coverage, it is very stable. When the failure is outside, it stops with a bounded report. That is not a flaw; it is the Workflow boundary.\n\nIf the system runs tests first, then searches config, reads `vitest.config.ts`\n\n, edits alias, reruns tests, and if failure continues adjusts from new logs, it is Agent. These steps are not prewritten; the model chooses from current evidence. Therefore it must have tool protocol, permission, state, and verification, or it may edit the wrong file or interpret a failure as success.\n\nRepeat the boundary: \"read config,\" \"modify alias,\" and \"rerun tests\" are only action intents. Tool Runtime and Execution actually read files, modify files, and run commands. Without this separation, model description is easily mistaken for fact.\n\nIf this Agent must automatically check multiple repositories every day for a team, it enters Harness. The system must create a session for every run, checkout the repo in a sandbox, restrict tools, record model output and tool intent, route high-risk writes through confirmation or policy, run verification after modifications, save diff, logs, and final state, support replay on failure, and measure success rate and failure types.\n\nFull evolution:\n\n```\nChatBot: explains logs, but does not execute.\nWorkflow: follows a fixed diagnostic flow, but does not dynamically decide.\nAgent: chooses next step from evidence, but every step must pass runtime constraints.\nHarness: hosts Agent and makes the dynamic process controllable, auditable, recoverable, and verifiable.\n```\n\nEngineering decision tree:\n\n``` php\n1. Does the user only need explanation, summary, rewriting, or Q&A?\n   yes -> ChatBot\n\n2. Can execution steps be written as a stable process in advance?\n   yes -> Workflow\n\n3. Must next step be decided from new runtime observations?\n   yes -> Agent\n\n4. Will this Agent run long-term, create side effects, serve multiple users, need recovery and audit?\n   yes -> Harness\n```\n\nThis tree intentionally puts Workflow before Agent, because many systems do not lack Agent; they lack a good Workflow.\n\nFor example, \"automatic daily report\" is primarily Workflow if data source, format, and delivery channel are fixed. The LLM may write natural language, but it should not decide today whether to query the database, send email, or skip a department.\n\n\"PR check\" is also Workflow if rules are clear:\n\n```\nrun tests\nrun lint\ncheck changelog\ncheck security scan\ngenerate summary\n```\n\nAgent becomes valuable when the system must decide from diff content which files to read, what context to ask for, which targeted tests to run, and how to locate complex regressions.\n\nBad smells of premature Agentization:\n\n```\nprompt contains many fixed steps but still lets the model decide every round.\nthere is only one universal shell tool, with no structured protocol.\nafter the model outputs a tool call, the system executes directly.\nthere is no clear stop condition.\ntool results are not written as an event stream.\nsuggestions and executed actions are not distinguished.\nthere is no verification step, but the model may declare completion.\nafter failure, only the final answer is visible; the process cannot be replayed.\n```\n\nCommon root:\n\n```\nthe system gives engineering control to model language.\n```\n\nModel language is excellent at expression and reasoning, not runtime responsibility. So when designing Agent systems, the first question should not be \"how to make it more autonomous,\" but:\n\n```\nWhich degrees of freedom are truly necessary?\nWhich should be pulled back into process, protocol, and policy?\n```\n\nThat is the start of Harness thinking.\n\nCompress the article into one load-bearing chain. The user says:\n\n```\nHelp me fix the failing tests in this project.\n```\n\nThe system can handle it in four ways:\n\n``` php\nChatBot:\nuser input -> message history -> Model -> natural language advice\n\nWorkflow:\nuser input -> fixed process -> command/read/LLM nodes -> report\n\nAgent:\nuser input -> Agent Runtime -> Model decision -> Tool Intent -> Tool Runtime -> Observation -> State -> next turn\n\nHarness:\nuser input / schedule -> Session -> Agent Runtime -> Policy/Tools/Execution/Context/Event Log/Verification -> recoverable result\n```\n\nThese chains decide capability and cost. If you only need the first but build the fourth, the system is overcomplicated. If you need the fourth but build only the first, the system manufactures hallucinations. If you need the third but miss parts of the fourth, the system can run but will not be stable.\n\nThat is why this tutorial advances gradually. We will not start with a full Harness. We start from the minimal Agent loop:\n\n``` php\nModel -> Loop -> Tools -> State\n```\n\nThen we add Provider Runtime, Tool Runtime, Context Engineering, Memory, Permission, Session, Observability, Verification, Multi-Agent, and Hosted Harness. Each layer is added not for architecture aesthetics, but because the previous layer exposes a new real-task failure mode.\n\nThis article's job is to pin down boundaries. Later, when discussing Tool Runtime, remember: tools are not ChatBot decoration; they are the protocol boundary through which Agent interacts with the real world. When discussing Context Engineering, remember: more context is not automatically better; context is the task-state projection needed for this Agent turn. When discussing Harness, remember: Harness is not a smarter Agent; it is the external system that keeps Agent execution under control.\n\nFour sentences:\n\n```\nChatBot: conversation problem; use the model to generate answers.\nWorkflow: deterministic process; use program flow for stable execution.\nAgent: uncertain task; let the model dynamically decide inside a loop.\nHarness: hosted Agent; use external systems to govern execution, state, permission, observability, and verification.\n```\n\nShorter:\n\n```\nBeing able to chat does not mean being able to execute.\nBeing able to execute does not mean needing dynamic decisions.\nBeing able to decide dynamically does not mean being stably hostable.\nStable hosting needs Harness to carry model-external responsibility.\n```\n\nWhen facing a new requirement, do not rush to say \"let's build an Agent.\" First ask: where exactly is the uncertainty?\n\nIf uncertainty is in expression, use ChatBot. If it has been digested into process, use Workflow. If it must be handled at runtime, use Agent. If that Agent enters real use, use Harness.\n\nThe next article asks a deeper question: what exactly is Harness? Why is it not a framework name, and not a bigger Agent? What model-external responsibilities does it own? We will split Harness into Execution, Tools, Context, Lifecycle, Observability, Verification, and Governance, drawing the control-system map for the rest of the tutorial.\n\nThe teaching project can expose two entries to show this boundary: `POST /api/prompt`\n\nas a simple debugging path, and `/api/runs`\n\nplus event streaming as the Harness-like run path. Deterministic workflow belongs in the API or tests; the model enters the loop only when the next step depends on observations. This teaches that not every automation needs an Agent. You need an Agent Loop when the next action depends on what the system just observed.\n\nGitHub source: [00-03-chatbot-workflow-agent-harness.md](https://github.com/LienJack/build-harness/blob/main/docs/en/00-03-chatbot-workflow-agent-harness.md)", "url": "https://wpnews.pro/news/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness", "canonical_source": "https://dev.to/lien_jp_db54b8b7fd9fa0118/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness-pcp", "published_at": "2026-06-03 03:23:35+00:00", "updated_at": "2026-06-03 03:42:19.535368+00:00", "lang": "en", "topics": ["ai-agents", "artificial-intelligence", "ai-products", "ai-tools", "ai-infrastructure"], "entities": ["Jest", "Slack", "ChatBot", "Workflow", "Agent", "Harness"], "alternates": {"html": "https://wpnews.pro/news/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness", "markdown": "https://wpnews.pro/news/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness.md", "text": "https://wpnews.pro/news/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness.txt", "jsonld": "https://wpnews.pro/news/system-boundaries-the-difference-between-chatbot-workflow-agent-and-harness.jsonld"}}