Human-in-the-Loop: The Most Important Concept in AI That Keeps You Employed

wpnews.pro

A deep look at what HITL actually is, when it genuinely matters, when it doesn’t, and why throwing it away is basically asking for your job to disappear with it.

I know what you’re thinking

Not another article about Human-in-the-Loop. Probably one of the most overexposed concepts in computer science and AI, right up there with “blockchain synergy” and “move fast and break things.” You’ve seen the LinkedIn posts. You’ve sat through the conference talks and you’ve read the white papers.

What this article is actually about, is reframing your viewpoint.

Right now there are 8 billion people on this planet. And something genuinely unprecedented just happened: the first cohort of university students who spent their entire degree using AI, every essay, every problem set, every late-night cram session, just walked across a stage and collected a diploma. Four full years of AI-assisted education. The first of their kind.

They are leaving both excited and absolutely terrified.

Not because AI is going to take over the world. Because they’re entering a workforce that is the most turbulent it has been in living memory. Disruption, job displacement, COVID economic regression, wars reshaping global supply chains, automation eating through entire industry categories, and an AI revolution that promises hyperscale growth while simultaneously making the case that we just don’t need as many engineers, developers, writers, analysts, and support staff as we used to.

The stats are on the front page. You’ve seen them. What’s almost always missing from those stories is the other half of the equation: the new jobs being created. But that’s a longer argument, and we’ll get to it, Rest assured, this is not another economics lesson.

First, let’s acknowledge the reality of the choice in front of roughly a billion working-age people right now: you can take a corporate job, fill out the AI-screened application, wrestle through four rounds of interviews, the psychometric tests, three references who haven’t spoken to you in two years, three months of probation, KPI metrics, and a lot of meetings, so many unnecessary meetings. Or you can start something yourself, get VC funding if you’re lucky, or have that rare idea that gets traction before you’ve burned through your self-funding. Or you were born into money, which, fair enough. Why weren’t all of us?

The answer to “how do I stay relevant in an AI-saturated economy” is not buried in a productivity hack or a prompt engineering course. It’s in understanding where the human still matters and why a specific, boring-sounding engineering concept is actually the load-bearing wall between a world where humans remain in the economic picture and a world where we become the guy in Wall-E.

You remember the captain. Slightly plump and round. Mildly irritated. His one job on the entire cruise ship is the morning briefing to passengers who are excited to change their jumpsuit colors. Everything else: navigation, maintenance, life support, course correction as it is handled by AI. He doesn’t pilot the ship. He doesn’t repair anything. He doesn’t make decisions. He’s there for appearances. For the vague sense that someone is nominally in charge.

That’s the trajectory we’re on if we get this wrong.

            Captain McRae understood HITL

The uncomfortable question underneath all of it

Who decides whether an AI system has a human-in-the-loop at all?

That question sounds philosophical until a drone crashes into your house or a self-driving car swerves into oncoming traffic. Then it becomes very specific very fast. Who is responsible? “Not my problem; I wasn’t driving.” “We have no record of that system going rogue, must have been a programming glitch.” “The terms of service indicate that…”

I’m not here to design a perfect governance framework. But these are not hypothetical questions. They’re playing out in courts right now across the US, EU, and China. The EU AI Act, passed in 2024, dedicates an entire article — Article 14 — specifically to human oversight requirements for high-risk AI systems. Governance and corporations will ultimately decide which systems get HITL and what the rules are. And if there’s no pressure from users, engineers, and the public to demand it, the decision will default to whatever is cheapest to ship.

Do you like jobs? I do.

Let me explain why human-in-the-loop is the thing that keeps them.

And yes, I heard we are getting UBI! universal basic income, and crickets from the governments so far…

We are going to need jobs until they sort it out.

And the math that I have calculated in the past makes Ubi gobsmackingly difficult.

(Thufir calculating UBI payments to billions of people every week)

The Gemini 3 moment nobody expected

At the Gemini 3 Hackathon: 35,577 participants, 4,499 projects submitted — a pattern emerged that surprised almost everyone involved. Look at what won.

Globot (grand prize): Four specialized agents pulling from geopolitical signals, financial risk data, satellite imagery, and shipping routes with a fifth agent stress-testing the others’ conclusions. The whole system turns supply chain chaos into a confident decision recommendation in 60 seconds. Then it hands control back to the human. The AI doesn’t reroute the shipment. The supply chain manager does, with full context assembled in under a minute instead of hours.

Aegis (top prize): An autonomous multi-agent command center that prevents 911 systems from collapsing under mass emergency calls, triaging thousands of distress signals simultaneously using five specialized agents: coordinator, triage, surveillance, logistics, and a reporter writing post-mission summaries. In a real disaster, every one of those handoffs back to a human responder is a HITL gate.

Netra (top prize): A high-speed vision system for the visually impaired that reads text, recognizes faces, and describes surroundings in real-time. HITL here isn’t an approval gate as It’s the fundamental architecture. The human makes every decision. The AI provides perception they couldn’t otherwise access. The user is always in the loop by design.

Agent-weaver (winner): Turns Gemini into a coordinated team of five specialized agents with shared memory and their words, “human-verified annotations and real-time collaboration between teams and their agents.” They put HITL in the product description because they understood it was a feature, not a limitation.

Four of the top winners across wildly different domains: supply chain, emergency response, accessibility, and multi-agent collaboration — all converged on the same architectural decision: keep the human in the critical path.

The teams that pushed for full autonomy kept shipping systems that failed in ways that were genuinely hard to recover from. The winning insight had nothing to do with better prompting or a more sophisticated model. It was knowing when to stop and ask.

That’s HITL. And it’s been quietly central to how serious AI systems get built for decades as it’s also one of the most misunderstood ideas in the field. Simultaneously oversold as a safety panacea and dismissed as evidence the AI isn’t good enough yet.

Here’s what’s actually going on.

What HITL actually means

The term was formalized in the machine learning community most visibly by Robert Monarch, whose 2021 book Human-in-the-Loop Machine Learning gave the concept a rigorous treatment. Monarch used it specifically to describe the practice of keeping humans in the feedback loop of model training not just at deployment but throughout the iterative cycle of annotation, review, and model update.

Stanford HAI defines it more broadly: “a model that requires human interaction.” That breadth is intentional. HITL isn’t one thing. It’s a design philosophy that manifests differently at different stages of an AI system’s life:

THE THREE CONTEXTS WHERE HITL APPEARS

──────────────────────────────────────────────────────────────────────────

TRAINING TIME

Humans annotate data, review edge cases, correct model outputs.

The human is inside the training loop.

Example: Monarch's annotation pipelines; RLHF at Anthropic.

DEPLOYMENT TIME (inference)

Humans review, approve, or reject AI actions before they execute.

The human is inside the action loop.

Example: Claude's usage policy; cloak_ssh_approve in VEKTOR.

MONITORING TIME (post-deployment)

Humans audit AI behavior, flag drift, trigger retraining.

The human is inside the improvement loop.

Example: EU AI Act Article 14 requirements for high-risk systems.

──────────────────────────────────────────────────────────────────────────

When people argue about HITL, they’re usually arguing about the second context — inference-time approval, while actually talking past each other about all three.

The case against, when not needed

On Medium and in various articles, there are already many different viewpoints for and against HITL.

When a company builds an AI system that requires constant human supervision to function safely and then markets it as “AI with human oversight,” that framing often does obscure a fundamental capability gap. The AI can’t do the task reliably. The human is doing it. The AI is doing the admin.

The legitimate version of this critique has three parts:

HITL as theater. Many “human review” steps are compliance checkbox exercises. The AI flags a decision, a human glances at it and clicks approve in 0.3 seconds, the decision executes. This isn’t oversight — it’s liability laundering. It gives the appearance of human accountability while removing the conditions under which that accountability could be meaningfully exercised.

HITL as a ceiling. An approval gate that fires on every action isn’t safety engineering — it’s just slow automation. If a system interrupts you 200 times an hour for approvals you always grant, you’ve built a worse manual process, not a safer automated one. At that point, the human isn’t in the loop; the human is the loop, and you’ve spent significant engineering effort to give them a worse interface.

HITL as permission to ship incomplete systems. The most cynical deployment of HITL: building a system that can’t handle edge cases reliably, wrapping it in “human oversight,” and calling it production-ready. The HITL gate becomes the product’s excuse for not being finished.

These are real problems. Taking them seriously is a precondition for understanding why well-designed HITL is still worth building.

The case for HITL

The strongest argument for human-in-the-loop isn’t safety theater — it’s irreversibility.

Chen et al. (Systems, 2023) frame this cleanly in their analysis of HITL architectures: the key variable is not AI accuracy, but the cost of error correction. On a spectrum from “trivially reversible” to “permanently catastrophic,” HITL makes more sense as you move right.

THE IRREVERSIBILITY SPECTRUM

──────────────────────────────────────────────────────────────────────────

TRIVIALLY REVERSIBLE ←────────────────────────────→ CATASTROPHIC

Reading a file Editing a config Dropping a table Deleting backups

Fetching a URL Restarting a service Revoking access Production rollback

Running a test Deploying code Sending to all users Data loss

Posting publicly Database migration

HITL VALUE: low HITL VALUE: medium HITL VALUE: high HITL VALUE: critical

──────────────────────────────────────────────────────────────────────────

The key insight is that HITL isn’t binary. You don’t have a system that “has HITL” or “doesn’t have HITL.” You have a system where certain actions trigger a human review and others don’t, and the design question is which actions fall into which category.

IBM Think’s analysis of HITL systems in enterprise contexts identifies two additional factors beyond irreversibility:

Distributional shift: When the current situation is meaningfully unlike the training distribution, AI confidence scores become less reliable. A human who understands context is better positioned to catch this than a model that doesn’t know what it doesn’t know.

Accountability requirements: In regulated industries, finance, healthcare, critical infrastructure, the EU AI Act Article 14 requires “human oversight measures” for high-risk AI systems. This isn’t optional engineering — it’s a legal baseline. HITL is how you implement it.

Trust: The honest version of the case for HITL isn’t “AI can’t be trusted.” It’s: some mistakes are cheap and some are expensive, and the approval gate should sit at the boundary between them.

How we got here: a brief history

HITL didn’t originate in AI. It’s a systems design concept with roots in control theory and human factors engineering from the 1960s. Nuclear power plant control rooms were HITL systems. Aviation cockpits were HITL systems. The idea that automated systems should and consult a human operator at specific decision point, particularly those with high-consequence or low-reversibility outcomes, predates machine learning by decades.

What changed with modern AI was the texture of the problem. Traditional HITL in control systems involved well-specified edge cases: known failure modes, known escalation paths. Modern AI systems fail in ways that are harder to specify in advance. They hallucinate. They misinterpret context. They extrapolate outside their training distribution in ways that produce confident-sounding nonsense.

Monarch’s contribution was applying the HITL frame specifically to the training loop, recognizing that the human’s role wasn’t just approval but active model improvement. The annotator reviewing AI outputs isn’t just checking; they’re generating the signal the model uses to get better.

Anthropic operationalized a version of this in their usage policy, which requires that Claude’s actions, particularly in agentic contexts, include mechanisms for human oversight. This isn’t marketing language. It’s a specific design constraint: Claude should prefer reversible actions, should and verify when uncertain, and should not take high-impact actions without explicit instruction.

The Gemini 3 Hackathon teams arrived at the same conclusion empirically. Full autonomy failed. Systems with deliberate approval gates at high-consequence decision points won.

When you don’t need HITL

This is the part that often gets left out of the “AI safety” conversation: over-applying HITL is also a failure mode.

If every AI action requires human approval, you haven’t built a useful system. You’ve built an extremely expensive autocomplete. The entire value proposition of agentic AI is that it can handle routine work without constant supervision, freeing human attention for decisions that actually benefit from it.

WHEN HITL IS NOT THE RIGHT ANSWER

──────────────────────────────────────────────────────────────────────────

✗ Reading files, checking status, fetching data

→ Zero risk. Zero irreversibility. Auto-execute.

✗ Running tests, linting code, analyzing logs

→ Informational. Results surface to human without action.

✗ Well-specified, frequently-repeated, low-stakes tasks

→ If you've approved this a hundred times, automate the approval.

✗ Situations where the human will always approve

→ A rubber-stamp HITL gate is worse than no gate. Remove it.

──────────────────────────────────────────────────────────────────────────

The skill in HITL design is knowing where the boundary sits. That requires understanding both your system’s actual failure modes and your users’ actual cognitive bandwidth. An approval gate that fires so frequently that users develop approval fatigue is a security liability, not a safety feature.

A Real-World Example

How we implement it: the three-tier model

VEKTOR’s cloak_ssh_exec implements HITL as a tiered system rather than a binary gate. Every command is automatically classified before execution. You never decide the tier as the system does. What you decide is whether to approve the ones that need it.

VEKTOR HITL: THREE-TIER EXECUTION MODEL

──────────────────────────────────────────────────────────────────────────

TIER 1: READ

┌──────────────────────────────────────────────────────────────────────┐

│ Auto-executes. No approval required. Results returned immediately. │

│ │

│ Examples: ls, cat, ps aux, df -h, grep, curl GET, nginx -t, │

│ systemctl status, tail -f (read-only) │

│ │

│ Rationale: Read operations have zero irreversibility. │

│ Requiring approval for them would be pure overhead with no │

│ safety benefit. │

└──────────────────────────────────────────────────────────────────────┘

TIER 2: WRITE

┌──────────────────────────────────────────────────────────────────────┐

│ Requires explicit approval before execution. │

│ Auto-backs up affected files. Rollback key issued. │

│ │

│ Examples: sed -i, cp (to system paths), nginx reload, │

│ npm install, systemctl restart, curl POST, │

│ crontab modifications, config file writes │

│ │

│ Rationale: These change state. Mistakes are recoverable │

│ (backup exists) but require deliberate attention. │

└──────────────────────────────────────────────────────────────────────┘

TIER 3: DESTRUCTIVE

┌──────────────────────────────────────────────────────────────────────┐

│ Requires approval + creates full rollback snapshot. │

│ Approval prompt includes explicit consequence description. │

│ │

│ Examples: rm -rf, DROP TABLE, crontab -r, │

│ systemctl disable, database migrations, │

│ permission removal, firewall rule deletion │

│ │

│ Rationale: These approach the irreversible end of the spectrum. │

│ The friction is intentional. If you're approving this quickly, │

│ you're not actually reviewing it. │

└──────────────────────────────────────────────────────────────────────┘

──────────────────────────────────────────────────────────────────────────

The tier boundaries are not arbitrary. They map directly to the irreversibility spectrum from Chen et al. READ operations are trivially reversible, no gate needed. WRITE operations are recoverable with a backup, gate needed, and recovery path provided. DESTRUCTIVE operations approach true irreversibility: gate needed, explicit consequence description, and rollback snapshot required.

What an approval flow actually looks like

You: "Fix the SSL certificate renewal cron on my VPS"

Claude:

[Diagnosing — TIER 1, auto-executing]

→ sudo crontab -l

→ Returns: current crontab (no approval needed)

[Proposing fix — TIER 2, requires approval]

┌────────────────────────────────────────────────────┐

│ WRITE OPERATION — Approval Required │

│ │

│ Command: sudo crontab replacement │

│ Change: Adding certbot renewal line │

│ Backup: /tmp/crontab.bak (auto-created) │

│ Rollback: cloak_ssh_rollback("abc123") │

│ │

│ Approve? [yes/no] │

└────────────────────────────────────────────────────┘

You: yes

Claude:

→ Command executed

→ Rollback key: abc123

→ "Cron updated. certbot will run at 2:30am daily."

Multi-step plans with cloak_ssh_plan

For deployments, you approve a plan rather than command-by-command:

You: "Deploy the updated nginx config"

Claude:

┌────────────────────────────────────────────────────┐

│ SSH PLAN — 3 steps │

│ │

│ Step 1 [READ]: nginx -t (test config) │

│ → Auto-executes first │

│ │

│ Step 2 [WRITE]: cp nginx.conf /etc/nginx/ │

│ → Backup of current config taken │

│ │

│ Step 3 [WRITE]: systemctl reload nginx │

│ → Rollback key issued │

│ │

│ Approve all? [yes/no] │

└────────────────────────────────────────────────────┘

You: yes

Steps execute in sequence.

If Step 2 fails, Step 3 does not run.

Rollback key valid for 24 hours.

This is HITL done correctly: the human reviews what matters (the plan and its consequences), not what doesn’t (reading the current state). The approval moment is meaningful because it’s rare enough that it still captures attention.

The HITL workflow: a reference diagram

THE HITL DECISION LOOP

──────────────────────────────────────────────────────────────────────────

USER INTENT

│

▼

┌─────────────┐

│ AI Agent │ ← interprets intent, plans action

│ (Claude) │

└──────┬──────┘

│

▼

┌─────────────────────────┐

│ Action Classification │ ← automatic tier assignment

└──────────┬──────────────┘

│

┌─────────┴─────────┐

│ │

READ tier WRITE / DESTRUCTIVE tier

│ │

▼ ▼

Execute ┌─────────────┐

immediately │ HUMAN │

→ Return │ REVIEW │ ← the actual HITL gate

results └──────┬──────┘

│

┌──────────┴──────────┐

│ │

APPROVE REJECT

│ │

▼ ▼

Execute with Return to AI

backup/rollback → Revise plan

→ Store result → Or stop

│

▼

┌─────────────┐

│ Memory │ ← vektor_store: what happened,

│ Update │ outcome, rollback key

└─────────────┘

│

▼

Next session:

AI already knows what was done

──────────────────────────────────────────────────────────────────────────

The memory integration is where VEKTOR’s HITL closes the loop in the Monarch sense as it’s not just an approval gate, it’s a feedback channel. What you approved, what you rejected, and what the outcomes were all feed back into the AI’s operational context for next time.

Why HITL will create more jobs than it eliminates

This is the part of the conversation that gets the most pushback, and where the evidence is most counterintuitive.

The standard fear about AI automation is substitution: AI does the task, human becomes unnecessary. This model works reasonably well for narrow, fully-specified, high-volume tasks in stable environments. It does not work well for:

Tasks where edge cases have high consequence

Tasks where the definition of “success” shifts over time

Tasks that require accountability to external parties

Tasks that operate at the frontier of what AI systems can reliably do

HITL systems, the well-designed ones, don’t eliminate human judgment. They concentrate it. Instead of a human spending 70% of their time on routine execution and 30% on actual judgment calls, a HITL system handles the routine execution and presents only the judgment calls for human review.

The Gemini 3 Hackathon demonstrated this empirically. The teams that built effective HITL architectures weren’t building systems to replace their users — they were building systems that gave their users leverage. The human reviewer of a HITL system does fewer, harder, more consequential things per hour. That’s not elimination. That’s role transformation.

The jobs that HITL creates or preserves:

ROLE WHAT HITL DOES TO IT

──────────────────────────────────────────────────────────────────────────

Systems administrator Moves from "execute routine changes" to

"review and approve AI-proposed changes"

→ Same accountability, 10x throughput

Annotator/reviewer Robert Monarch's core insight: HITL training

systems create structured annotation work

→ More annotation jobs, not fewer, as systems

need labeled edge cases to improve

Compliance officer EU AI Act Article 14 requires human oversight

for high-risk systems. Someone has to do this.

→ New job category, not eliminated category

QA engineer AI can generate test cases and flag failures,

but HITL review catches what automated tests miss

→ QA role expands to AI system auditing

Domain expert HITL surfaces the decisions that require

subject-matter expertise rather than execution

→ Expert time goes to expert problems only

──────────────────────────────────────────────────────────────────────────

Carnegie Mellon’s research on human-AI teaming arrives at a similar conclusion: the highest-performing human-AI pairs are ones where the division of labor is explicit, the handoff points are clean, and the human’s role is preserved in the decisions that benefit most from human judgment. That’s a design problem, not a technology problem.

The net employment effect is not obvious. Automation historically displaces specific task categories while creating demand for new roles — most of which involve supervising, auditing, training, and correcting the automated systems.

HITL is the architectural pattern that makes that supervision possible. Destroying the HITL gate to achieve “full autonomy” doesn’t eliminate the need for human oversight as it just makes that oversight reactive (fixing problems after they happen) rather than proactive (preventing them before they do).

When to use HITL: a practical decision framework

SHOULD THIS ACTION HAVE A HITL GATE?

──────────────────────────────────────────────────────────────────────────

Ask:

HITL done well is the opposite: it’s a precise engineering choice about where human judgment adds the most value and where automated execution adds the most throughput. It respects human attention by reserving it for decisions that actually need it. It makes AI systems safer not by making them slower, but by making their failure modes recoverable.

The Gemini 3 teams didn’t add HITL because they were timid. They added it because they were rigorous.

The EU AI Act doesn’t mandate human oversight for high-risk systems as a gesture toward caution. It mandates it because the risk of unreviewed AI action at consequential decision points is real, documented, and expensive.

Robert Monarch didn’t write a book arguing that humans should stay in the loop because AI isn’t good enough yet. He argued that the training loop — the process by which AI systems improve, is inherently collaborative and that designing it well means designing the human-AI handoff well.

These are all the same insight expressed at different scales: the question isn’t whether to have a human in the loop. The question is which loop, at which point, with what authority, and with what recovery path if they get it wrong.

Getting that right is not a concession to AI’s limitations. It’s what good systems engineering looks like.

How we set up human-in-the-loop

Enable SSH with approval gates:

Store your SSH key once:

"Store my VPS SSH key as 'vps-main' in cloak_passport"

All subsequent commands classify automatically.

TIER 1 (READ): auto-execute, no approval.

TIER 2 (WRITE): approval prompt appears before execution.

TIER 3 (DESTRUCTIVE): approval prompt with explicit consequence description.

Use plan mode for multi-step operations:

"Deploy the new config to my VPS using cloak_ssh_plan"

Claude shows the full plan with tier classifications.

You review the plan once. Approve or reject.

Execution proceeds step-by-step, stopping on failure.

Store what happened for next time:

After any significant operation, Claude automatically calls:

vektor_store("Deployed updated nginx config on [date].

Changed: SSL cert renewal cron.

Rollback key: abc123.")

Next session: Claude already knows this happened.

You don't have to brief it.

The combination gives you the throughput of automation on routine operations and the safety of human review on consequential ones — without requiring you to decide which is which on every command.

That’s the actual promise of human-in-the-loop: not that the AI needs a babysitter, but that a few deliberate moments of human attention, placed at exactly the right points, make the rest of the automation trustworthy.

VEKTOR Memory, including cloak_ssh_exec, cloak_ssh_plan, and cloak_ssh_approve, is available at vektormemory.com.

References

Monarch, R. M. (2021). Human-in-the-Loop Machine Learning. Manning Publications.

Chen, J. et al. (2023). Human-in-the-Loop System Architectures for AI Decision Support. Systems, MDPI.

Stanford HAI. “Human-Centered Artificial Intelligence.” Stanford University.

IBM Think. “Human-in-the-Loop AI.” IBM.

European Parliament. Regulation (EU) 2024/1689 — Artificial Intelligence Act, Article 14: Human Oversight.

Anthropic. Claude Usage Policy — Agentic and Autonomous Systems.

Gemini 3 Hackathon post-mortem analysis — Globot, Aegis, Netra team architectures.

Human In The Loop

AI

Automation

Agentic Ai

Agentic Workflow

source & further reading

dev.to — original article ReskPoints: AI Agent Logging with Sampling, Masking, and Multi-Export Cutting juniors is the most expensive way to cut costs Stop Asking. Start Delegating: How I Actually Use AI On My Site

Human-in-the-Loop: The Most Important Concept in AI That Keeps You Employed

Run your AI side-project on zahid.host