The frontier for economic value from AI agents is non-gullibility A developer argues that the key limitation for AI agents in enterprise software is not reasoning capability but non-gullibility—the ability to distinguish trusted instructions from untrusted data. Current LLM APIs and harnesses often mix authority levels, making agents vulnerable to prompt injection and information exfiltration. The developer calls for better harness construction to support non-gullibility, which is essential for agents to perform privileged actions securely. The usual measures of AI progress have not suited my lived experience for some time now: One measure is “maximum length of human task that AI can complete.” The ideal goal here seems to be AI developing ever larger software systems, like browsers. Another measure is “key reasoning breakthroughs,” like proving some math theorem or finding zero-days. The ideal goal here seems to be the Riemann hypothesis https://x.com/polynoamial/status/1834280969786065278?s=20 . I think both are worthwhile goals. But in my day job doing enterprise software development, neither of these is limiting me. What limits me is the degree of trust I am permitted to place in an agent’s actions without compromising my employer’s information security. Almost everybody limits what their AI agent can do, through sandboxing, approval flows, or manual review. This is not mostly about wanting to judge whether the AI’s work was good enough to merge. It is about defending against low-probability but devastating scenarios: the agent exfiltrating information to an attacker, installing malware, or granting privileged access to company computers. The missing property is non-gullibility: the ability to distinguish trusted instructions from untrusted data, even when the untrusted data is adversarially shaped to look like instructions. To be as useful as a human, the AI agent needs to consume potentially malicious input, like web search results, documentation, GitHub issues, and repository files. It needs to follow legitimate instructions in documentation, while rejecting malicious search results that instruct it to install malware. The agent also needs to work with confidential information, and must not exfiltrate it, including through the side effects of its actions, such as fetching URLs with confidential information in them. And the agent needs to perform privileged actions: deployments, service integrations, infrastructure changes. It is not practically possible to validate Terraform code without running it against a real cloud, nor to validate an integration with another service without performing real calls to their systems. Modern web application development often consists largely of stitching together services, some external SaaS vendors, others internal but separately deployed. So while my X feed celebrates an Erdős problem solution or the Bun Zig-to-Rust migration https://claude.com/blog/introducing-dynamic-workflows-in-claude-code , many people are still stuck in manual code review and approval fatigue because they feel obliged to prevent unlikely but devastating consequences. Non-gullibility is not a property that can be achieved by the foundation model alone. It also requires correct construction of the agent harness, because only the harness knows the data's true source and trustworthiness, and the harness assembles trusted and untrusted data into the model’s input. Here, it seems to me that non-gullibility could be supported much better. LLM APIs today distinguish system, user, assistant, and tool messages in principle, but practical systems often mix authority levels for two reasons: APIs and harnesses restrict which message types are permitted in which positions. System messages are often only available at the beginning of the conversation. Tool messages are only allowed in response to tool calls. A workaround in practical harnesses like Claude Code and opencode is to include a textual fragment like