Build a Basic AI Agent from Scratch: Human in the Loop and Security

A developer built a basic AI agent from scratch with human-in-the-loop controls and security features, adding permission modes that require user approval for risky actions like writing files or running shell commands. The agent classifies tools by risk level and offers three modes: default, acceptEdits, and dangerouslySkipPermissions.

Build A Basic AI Agent From Scratch: Human in the Loop & Security 40 minute read · Artificial Intelligence /articles/ai Previous parts of Build a Basic AI Agent From Scratch : You can find and clone this code in this blog series' Github repo . In the previous part of the Build A Basic AI Agent From Scratch series, we gave our agent the ability to plan and work on long tasks. We added a scratchpad, a to-do list and a system prompt that explains to the model how to break work down, recover from failures and keep going until the task is actually done. That made the agent much more useful, but it also made it more dangerous. Running commands and editing files indiscriminately can have bad consequences that cannot be undone. We want our agent to be able to work autonomously but at the same time check with you before running potentially harmful tools. In this part of the series we will add human in the loop controls to our agent. The agent will still be autonomous, but it will have to stop and ask for permission before doing potentially risky actions. It will also get a new tool that lets it ask the user a question when it does not have enough information to proceed. Human in the Loop In AI Agents, the term human in the loop means that some decisions require the manual action by a human before they run. This ensures that some sensitive actions are not performed without passing the test of the criterion of a human. What Should Require Permission? Not every tool call needs the same level of scrutiny. If the agent asks the user for permission on every single tool call, it becomes annoying and slow. On the other hand, if the agent never asks for permission, it becomes unsafe. So we will classify tools by risk: Read tools can inspect the filesystem but do not change it. Planning tools only update the agent's internal state. Interaction tools ask the user for clarification. Write tools modify files. Other action tools can have broader side effects, like running shell commands or fetching from the network. For this version of the agent, the safe default is: - Reading files is allowed. - Planning is allowed. - Asking the user a question is allowed. - Writing files requires permission unless we explicitly start the agent in a mode that accepts edits inside the current project. - Running bash commands requires permission. - Fetching web pages requires permission. Permission Modes We will add three permission modes to the agent: class PermissionMode Enum : DEFAULT = "default" ACCEPT EDITS = "acceptEdits" DANGEROUSLY SKIP PERMISSIONS = "dangerouslySkipPermissions" The modes work like this: default : read tools and planning tools are allowed, everything else asks for permission. acceptEdits : read tools, planning tools and writes inside the current working directory are allowed, everything else asks for permission. dangerouslySkipPermissions : all tools run without asking. The last mode is intentionally named in a scary way. Running without any safeguards is the kind of mode you might use in a throwaway sandbox or a trusted automation environment. It shouldn't be the default for an agent running on your machine with precious files and credentials. We can expose the permissions mode as a command line flag: parser = argparse.ArgumentParser description="Coding agent with configurable tool permission gating." parser.add argument "--mode", choices= "default", "acceptEdits", "dangerouslySkipPermissions" , default="default", help= "Permission mode for tool execution. " "'default': read tools are free, everything else requires approval. " "'acceptEdits': read + write tools are free when inside the working directory, " "everything else requires approval. " "'dangerouslySkipPermissions': all tools run without any prompt." , Then we capture the current working directory when the agent starts, which we will use as the trust boundary for the acceptEdits mode. The agent can edit files inside the project, but writing outside the project still requires permission.: mode = PermissionMode cli args.mode working dir = Path.cwd print f"Agent started in '{mode.value}' mode working dir: {working dir} " client = get llm client agent loop client, mode, working dir Tool Categories Next, we will group the tools in three groups. Tools that can only read files or be used for planning will always be allowed because they are safe. Write tools will be more limited: Always allowed: read-only filesystem tools READ TOOLS = {"read file", "glob files", "grep"} Always allowed: internal planning/bookkeeping and user-interaction tools PLANNING TOOLS = { "todo append", "todo list", "todo update", "read scratchpad", "write scratchpad", "ask question", } Conditionally allowed in acceptEdits mode when target is within working dir WRITE TOOLS = {"write file", "edit file"} Checking the Write Path If the agent is in acceptEdits mode, we want to allow writes inside the project and block writes outside the project unless the user approves them. That means we need to resolve the path and check whether it is inside the working directory: php def resolve tool path tool name: str, args: dict - str | None: """Return the file-path argument for write tools, or None if not applicable.""" if tool name in WRITE TOOLS: return args.get "path" return None def is within working dir path: str, working dir: Path - bool: """Return True if path resolves to somewhere inside working dir .""" try: target = Path path if not target.is absolute : target = working dir / target target.resolve .relative to working dir.resolve return True except ValueError: return False Asking for Permission When the agent wants to run a tool that is not automatically allowed, we ask the user: php def ask permission tool name: str, args: dict - bool: """Interactively ask the user whether to allow a tool call. Returns True if the user grants permission, False otherwise. """ print f"\n permission required {tool name}" print f" Arguments: {json.dumps args, ensure ascii=False }" while True: try: answer = input " Allow this action? y/n : " .strip .lower except EOFError: print " EOF - denying permission " return False if answer in "y", "yes" : return True if answer in "n", "no" : return False print " Please enter 'y' or 'n'." We make it easy to see for the user which action the agent is trying to perform so they can understand what's going on. Before a risky tool runs, the user sees the tool name and the exact arguments the model requested. The user can approve or deny it. Now we can put all the rules together: python def check permission tool name: str, args: dict, mode: PermissionMode, working dir: Path, - bool: """Decide whether a tool call is permitted under the current mode.""" if tool name in READ TOOLS or tool name in PLANNING TOOLS: return True if mode == PermissionMode.DANGEROUSLY SKIP PERMISSIONS: return True if mode == PermissionMode.ACCEPT EDITS and tool name in WRITE TOOLS: path = resolve tool path tool name, args if path and is within working dir path, working dir : return True return ask permission tool name, args The function returns a boolean that represents whether the harness allows the agent to proceed with the tool call. Gating Tool Execution Now we need to integrate check permission into the tool execution path. This is the part of the agent loop that receives tool calls from the LLM and decides what to do with them: python def handle tool calls tool calls, messages, mode: PermissionMode, working dir: Path, : """Execute each tool the LLM requested and append the results to messages.""" for tool call in tool calls: name = tool call.function.name args = json.loads tool call.function.arguments print f" tool {name} {args} " if name not in TOOL REGISTRY: result = f"Error: unknown tool '{name}'. " f"Available tools: {list TOOL REGISTRY.keys }" elif not check permission name, args, mode, working dir : result = f"Permission denied: the user did not allow '{name}' to run. " "Do not retry this tool call without asking the user first." else: try: result = TOOL REGISTRY name args except TypeError as e: result = f"Error: invalid arguments for tool '{name}': {e}. " "Check the tool schema and retry with the correct arguments." print f" tool result {result :200 }{'...' if len result 200 else ''}" messages.append { "role": "tool", "tool call id": tool call.id, "content": result, } If the permission is denied by the user, we return a tool result back to the model saying that the permission was denied and that it should not retry the same tool call. This also keeps what happened clear for the agent. The model learns that its requested action did not happen, and it has to adapt. Letting the Agent Ask Questions Permission prompts are initiated by the harness. They happen when the model tries to do something risky. But there is another kind of human in the loop interaction: the agent itself might realize that it is missing information. Maybe the user asked it to update "the config" but there are multiple config files. Maybe it needs to know which deployment target to use. Maybe it found two possible interpretations of the task and choosing wrong could cause damage. For that, we add a new tool called ask question : php def ask question question: str - str: """Ask the user a clarifying question and return their answer.""" print f"\n agent {question}" try: answer = input " Your answer: " .strip except EOFError: return " no answer - EOF " return answer if answer else " no answer provided " This tool is very small, but it changes the behavior of the agent. The agent no longer has to guess when guessing would be unsafe. It can stop, ask one focused question, and continue with the user's answer in context. Then we register it in the tool registry: python from tools.interaction import ask question def get tool registry : return { "run bash": run bash, "read file": read file, "glob files": glob files, "grep": grep, "write file": write file, "edit file": edit file, "webfetch": webfetch, "todo append": todo append, "todo list": todo list, "todo update": todo update, "read scratchpad": read scratchpad, "write scratchpad": write scratchpad, "ask question": ask question, } And we expose it to the model with a schema: { "type": "function", "function": { "name": "ask question", "description": "Ask the user a clarifying question and wait for their answer. " "Use this when you are missing information required to complete the task " "and cannot reasonably infer it from context. " "Ask one focused question at a time. " "Do not use this for progress updates or to confirm actions you can already " "take - only ask when you are genuinely blocked." , "parameters": { "type": "object", "properties": { "question": { "type": "string", "description": "The question to ask the user.", }, }, "required": "question" , }, }, } Updating the System Prompt We also need to mention the new tool in the system prompt: "- Clarification ask question : ask the user a single focused question when you " "are genuinely blocked and cannot reasonably infer the missing information from " "context. Do not use it for progress updates or to confirm actions you can already " "take - only ask when it is strictly necessary to proceed.\n\n" Let's test it Now we can run the agent in default mode: bash $ python agent.py Agent started in 'default' mode working dir: /Users/roger/project You: Create a markdown summary of this project tool glob files {'pattern': ' ', 'path': '.'} tool result ./README.md ./src/main.py ./pyproject.toml tool read file {'path': './README.md'} tool result 1: Example Project 2: This project is a small CLI... tool write file {'path': 'summary.md', 'content': ' Project Summary\n\nThis project is a small CLI...'} permission required write file Arguments: {"path": "summary.md", "content": " Project Summary\n\nThis project is a small CLI..."} Allow this action? y/n : y tool result Wrote 61 bytes to summary.md Assistant: Done. I created summary.md with a short summary of the project. The agent could explore and read files without bothering us. But when it wanted to write a new file, it had to ask for permission. If we deny the action, the tool does not run: tool run bash {'command': 'rm -rf build'} permission required run bash Arguments: {"command": "rm -rf build"} Allow this action? y/n : n tool result Permission denied: the user did not allow 'run bash' to run. Do not retry this tool call without asking the user first. Assistant: I did not remove the build directory because permission was denied. This is the safety behavior we want. The agent can propose an action, but the user remains in control. Now let's test the clarification tool: bash $ python agent.py --mode acceptEdits Agent started in 'acceptEdits' mode working dir: /Users/roger/project You: Update the config to use production tool glob files {'pattern': ' config ', 'path': '.'} tool result ./config.dev.json ./config.prod.json ./src/config.py tool ask question {'question': 'I found multiple config files. Which one should I update: config.dev.json, config.prod.json, or src/config.py?'} agent I found multiple config files. Which one should I update: config.dev.json, config.prod.json, or src/config.py? Your answer: src/config.py tool result src/config.py tool edit file {'path': 'src/config.py', 'old string': 'ENV = "development"', 'new string': 'ENV = "production"'} tool result Edited src/config.py Assistant: Updated src/config.py to use production. In acceptEdits mode, the edit inside the working directory was allowed automatically. But the agent still asked a question first because it was not sure which file the user meant. What You've Built We now have an agent that is not only capable of using tools and planning long tasks, but also has a basic safety model around tool execution. The agent can: - Read and explore without unnecessary interruption. - Track its work with planning tools. - Ask the user clarifying questions when it is genuinely blocked. - Prompt for permission before risky tool calls. - Automatically allow project-local edits in acceptEdits mode. - Refuse to execute a denied tool call and report that denial back to the model. This is a big step toward building agents that can work autonomously without giving free reign to do whatever they want in your machine. What's next? Human in the loop is the first step towards security in AI Agents. But its not everything you should consider if you worry about agent security. In the next part, we will complete the missing parts of security by taking a look at sandboxing and audit logs.