The Asymmetric Future of AI in Cybersecurity

AI agents are rapidly improving in cybersecurity, enabling both defensive and offensive operations at unprecedented speed and scale. The dual-use nature of AI creates a tension between safety controls and usability, with local models emerging as a central challenge to centralized access restrictions.

Cybersecurity has always had an interesting property: the same knowledge can either protect a system or compromise it. A proof-of-concept exploit can help a vendor reproduce and patch a vulnerability, or help an attacker weaponize it before users update their systems. None of this is new, but what is changing is the speed, scale, and accessibility at which these actions could now occur. This post is slightly different from the previous ones. Rather than explaining a specific technical concept, the goal of the first part is to bring some order to the current and near-future relationship between AI and cybersecurity. In the second part, I will try to make some reasoned predictions that go beyond simply betting on red or black at a roulette table. If you already know AI 101 , I suggest skipping to the next section about open models. Before discussing about agents, it is important to give a quick and boring clarification on what the term actually means, because it is rapidly becoming one of the most overloaded concepts in AI. An AI agent is a system capable of reasoning over a task, interacting with tools, and executing multi-step actions toward an objective. In cybersecurity, this could range from an agent capable of enumerating an attack surface and chaining vulnerabilities together, to one automatically triaging alerts, or responding to incidents. At the time of writing, AI agents are not capable of conducting fully autonomous end-to-end cyber operations with consistently reliable success rates. However, their capabilities are improving quickly, especially in coding, code vulnerability detection, and biology so they are expected to improve in consistency and skill over time. The dual-use nature of cybersecurity also exposes a complexity to manage: intent. In clearly malicious scenarios, such as requesting ransomware deployment scripts or credential-stealing malware, intent is relatively straightforward to classify. The challenge emerges in the much larger grey area, where the exact same technical action may be either legitimate or harmful depending entirely on the context. This creates an intrinsic problem for language models because intent is ultimately expressed through language. For example, I can ask an LLM to help me validate a bug bounty report or provide a CTF narrative to bypass guardrails and exploit that vulnerability. To mitigate this problem, LLMs typically rely on a combination of safety alignment techniques, policy-based filtering, reinforcement learning from human feedback RLHF , and runtime monitoring systems designed to identify harmful intent or dangerous outputs. Safety mechanisms usually operate as moving thresholds , tightening restrictions too aggressively risks making the model unusable for legitimate security researchers, developers, and defenders. Relaxing them too much, however, lowers the barrier for malicious actors, making bypasses easier. This tension is one of the reasons why newer models, such as Anthropic’s Claude Mythos and OpenAI’s GPT-Cyber, have introduced forms of controlled https://www.anthropic.com/project/glasswing or trusted access https://openai.com/index/scaling-trusted-access-for-cyber-defense/ . In some cases, this has been criticized as partially a marketing decision to increase the “hype”, but regardless of the motivation, even organizations that publicly advocate broad accessibility are beginning to introduce verification layers or identity-based access controls. The Central Role of Local Models Much of the public discussion around AI safety assumes that access controls can meaningfully limit offensive capabilities. In practice this assumption grows weaker with each passing month. The long-term challenge for centralized control is not frontier cloud models, where companies can enforce access controls and usage policies. Rather, it will probably be the rise of local models, which will play the central role in the future. As of the time of writing, powerful models like Mythos 5 and GPT 5.6 Sol have been restricted by US authorities, allowing access only to a small set of approved companies. This underlines how these tools are increasingly becoming strategic assets and highlights the importance of having high-quality open source models to avoid such limitations. At the same time, if capability is the concern, we might see governments in the future impose restrictions on capable open-weight models above a certain threshold. The moment an open-weight model reaches performance comparable to state-of-the-art SOTA models, governments could attempt to regulate them just as they have tried to regulate closed models. Such bans could be justified on security grounds while also protecting the revenue of the largest LLM companies. While banning open source products often results only in a symbolic ban, given the existence of copies and mirrors that cannot realistically be recalled, or the ban attempt https://en.wikipedia.org/wiki/Crypto Wars of strong encryption in the 1990s there is an important difference. Running open-weight models with performance comparable to SOTA models requires substantial hardware resources, and with current technology it is impossible to run a model with the capabilities to solve significant cybersecurity tasks on systems with less than 128 GB of RAM without a significant loss in performance or without making it unusably slow. As a result, many users would still need to access these models through hosted endpoints, which are much easier to take down because they are more centralized. Finally, while a ban on open-weight models could be bypassed by individual citizens, it would be much harder for companies to circumvent without exposing themselves to potential fines. The Asymmetry Between Attackers and Defenders A common assumption in discussions about AI is that defenders will simply use AI agents too, creating some form of balance between offensive and defensive capabilities. That assumption ignores a critical asymmetry: the cost of failure is fundamentally different for attackers and defenders. Let’s imagine a scenario probably not too far away https://github.com/antirez/ds4 where better cybersecurity local models become runnable on a medium-to-high-range laptop. At that point, an attacker can easily deploy tens of agents against a single target, in the case of state-sponsored actors, hundreds of autonomous agents, and let them loop, trying different paths until they find something interesting or manage to reach the objective. For an attacker, the worst-case scenario for a failed automated action is often relatively limited: the attack gets detected, you get blocked, and the vulnerable assets you spotted get fixed. For defenders, the consequences of automation errors are usually much more severe. An autonomous defensive system that incorrectly isolates production infrastructure, blocks internal communications, or disrupts critical services can lead to some really unhappy meetings. Defensive automation operates under much tighter reliability constraints because it directly interacts with operational systems that directly affect the company’s business. A military analogy is useful here. For years, advanced military systems were optimized around highly expensive platforms. Then asymmetric conflicts demonstrated that cheap drones produced at scale could overwhelm systems that were individually far more advanced and expensive. This became evident with Iranian Shahed drones, which cost on the order of tens of thousands of dollars, being intercepted by air defense systems such as the Patriot missile, where each interceptor can cost several million dollars. Similar dynamics have also emerged during the Russia-Ukraine conflict. Cybersecurity may experience a similar transition. Even if defenders possess stronger models, more compute, and better infrastructure, attackers may still gain an advantage through scale and a higher tolerance for failure. If one organization deploys only a handful of carefully constrained defensive agents, while another actor deploys hundreds of offensive agents at a fraction of the cost, all orchestrated by a single larger model, the asymmetry in automation freedom may ultimately favor the attacker. What Defenders Should Do Now What should defenders do now? Probably less than many vendors currently suggest. One of the most misleading narratives in cybersecurity today is the idea that organizations should aggressively automate everything as quickly as possible. In reality, for the reasons discussed earlier, automating while still depending on a human bottleneck does not provide as many advantages as it appears to. The short-term opportunity is not full automation everywhere, but enrichment, small automations, and prioritization. This matters because security teams are usually overloaded and understaffed. Automating repetitive cognitive tasks such as alert triage, reverse engineering assistance, vulnerability prioritization, threat intelligence summarization, and detection engineering can already bring significant benefits. These are areas where models can reduce friction without requiring organizations to fully hand over control. Another area where current models are already demonstrating practical value is vulnerability discovery and remediation assistance. Models are showing that they can support large-scale code auditing, variant analysis, and bug discovery workflows. SOTA models can find zero days and complex vulnerabilities, but it is important not to forget that they can still produce false positives. This makes automated validation a useful companion if the goal is to automate more of the pipeline. This is also where the human bottleneck becomes most visible. When you have a model performing code vulnerability analysis and finding hundreds of vulnerabilities in a large codebase, even if you let it fix them, which for most companies is already too much trust, you still need quality testing and a human review step to check whether the model did everything correctly. To decide what a human needs to look at first, you need a clear prioritization process. This only grows in importance as automation increases the raw volume of inputs on both sides. The more findings everyone generates, the more central correct prioritization becomes. It is also worth being precise about where this capability is heading, because the trajectory is not uniform. So far, we have seen near-exponential improvements in code vulnerability discovery, but it is reasonable to expect those gains to slow down: many of the high-volume vulnerabilities will be found first, while future model improvements will likely uncover smaller, more specific vulnerabilities. A clear example https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/ was the discovery of zero days and critical vulnerabilities in Firefox: while the number of vulnerabilities found and fixed increased significantly, Mozilla did not officially disclose how many false positives were produced for each real vulnerability. \ The part that will likely keep improving is different, and arguably more consequential: the ability to connect multiple findings together, chaining several individually minor issues into a coherent and reliable full kill chain. There is also a complementary point: the cheapest vulnerability to handle is the one that never ships. The same models that are useful for auditing existing code are becoming increasingly valuable at the point of creation: reviewing changes before they merge and flagging insecure patterns, unsafe defaults, or missing input validation while the code is still being written. Given the asymmetry described earlier, where defenders pay far more for failure than attackers do, reducing the number of vulnerabilities that ever reach an exploitable state is one of the few moves that meaningfully pushes cost back onto the attacker. All of this assumes something the marketing rarely mentions: an AI system is only as good as the data underneath it. A model triaging your alerts cannot prioritize if you do not have a clear asset criticality classification, and it cannot reason about assets you do not know you have. The agent does not fix a broken foundation; it uses it. This means that making sure log integration is done correctly and that the attack surface is mapped in the right way remains central. At the same time, we should not forget that LLMs used for automation can themselves increase the attack surface. If an agent consumes emails, tickets, logs, documents, or web content before acting, those inputs may become control channels. This makes risks such as indirect prompt injection especially important and currently unresolved. Another concept currently marketed in the cybersecurity world is deception tooling for AI attackers: honeypots that try to lure the attacker into something worthless so they waste effort on it. Against a human or a handful of agents, wasting time genuinely hurts, but the serious version of the threat is orchestrated and parallel. A controller running hundreds of agents can time-box each one, terminate those that stop making progress, and treat any agent that gets stuck in a decoy as noise that was already budgeted for. Trapping one path does not stall the campaign. Local models remove what little cost remains, since once inference runs on hardware the attacker already owns, a wasted token costs little more than electricity, and you cannot meaningfully burn a budget that is effectively free. None of this makes deception useless. Its enduring value lies in visibility, since a decoy interaction tells you, with high confidence and a very low false-positive rate, that someone is probing where they should not be, often well before they reach anything that actually matters. Finally, there is also a quieter risk that deserves attention, because it is already happening. Finance, marketing, operations, HR, and almost any other team can now use AI to spin up internal applications and automations without involving engineering or security. Most of these may be harmless, but some of them could manage real data, and they are often built by people who are not thinking about authentication, input validation, or least privilege. The result is a steadily expanding internal attack surface made of tools nobody formally owns, nobody reviewed, and nobody is monitoring. AI does not only make defenders and attackers faster; it is quietly increasing the number of things a defender has to defend. The Forecasting Game In 2016, Hinton said https://www.youtube.com/watch?v=2HMPRXstSvQ it was “completely obvious” that within five years deep learning would outperform radiologists. That same year, Elon Musk not exactly famous for restraint in his predictions claimed https://en.wikipedia.org/wiki/List of predictions for autonomous Tesla vehicles by Elon Musk we should be able to do 90 percent of miles driven autonomously within three years. Exactly ten years later, neither has happened, and there are still radiologists who drive themselves to work every morning. Making predictions about the future of AI is the equivalent of playing red or black at the casino. The only way to bring value to these kinds of predictions is to attach some reasoning to them. So let’s make a forecasting game and divide them in short 1yr and long terms 5yrs . I avoid doing 10+ because it would be purely fantasy. Short term: I do not expect fully autonomous end-to-end cyber operations to become the default. The most likely scenario is more boring but also more realistic: AI becomes deeply embedded into existing offensive and defensive workflows. Attackers will use agents to speed up reconnaissance, vulnerability analysis, phishing personalization, exploit adaptation, documentation reading, and tool chaining, but humans will still provide direction. On the defensive side, the biggest value will still come from supporting existing processes rather than from fully autonomous response. The uncomfortable part is that this is already enough to change the scenario. Even if agents are unreliable, they can generate more attempts, more plausible leads, and more noise at lower cost. Most organizations will not be compromised by a superhuman autonomous hacker, but by the same weak foundations as today: exposed assets, unpatched systems, poor logging, and rushed internal tools built with AI. Long term: In five years, I expect the boundary between tool and operator to be much less clear. The realistic scenario is not a single agent that receives “hack this company” and reliably completes the whole operation without help. Local models will likely be good enough for many of these tasks, which means access controls on frontier cloud models will matter, but they will not be a complete solution. Defenders will also improve, but more unevenly. Mature organizations will use AI to continuously review code, map assets, prioritize exposures, generate detections, and simulate attacks before real attackers arrive. Less mature organizations will mainly add AI products on top of messy infrastructure and call it a strategy. My best guess is that AI will not make cybersecurity unrecognizable in five years, but it will make the current asymmetry sharper: attackers will get cheaper scale, while defenders will be forced to become much better at prioritization, prevention, and operational discipline. Of course, an unexpected research breakthrough or major efficiency gain could radically accelerate these developments and render much of this forecast obsolete, but I still believe that based on the research published so far it seems that major changes to the architecture used do not bring or maintain the performance and I feel that improvements on current architecture will be cumulative. This is the disclaimer that will save me when this happens so I can say “I’ve always said it ” Conclusion AI agents are unlikely to suddenly replace human attackers or defenders overnight. The more realistic outcome is a gradual amplification of asymmetry. Attackers benefit from scalability, experimentation, and tolerance for failure, while defenders operate under stricter reliability constraints, where mistakes directly translate into operational and financial damage. That imbalance matters and will always matter. I think there are two things we should keep an eye on: 1 Preserve open source models. While it may be true, as discussed, that this will increase attacker capabilities, keeping access democratic avoids having a few entities decide who can access these systems and who cannot. 2 Keep people inside companies experimenting with AI tools using demo data first , and bring into production only the things that genuinely create value. It does not have to be “Cut the workforce, let AI handle it. AI everywhere ” , nor a complete rejection of these tools. Otherwise, the asymmetry will only continue to grow. Many discussions around AI and cybersecurity remain exaggerated, speculative, or disconnected from operational reality. The industry has a long history of overpromising revolutionary change while underestimating the complexity of real environments. Everyone is talking about and sharing AI because using it only requires typing, but deciding how to implement it and what to do with it is a bit more complex than assuming you understand how these systems work because you asked, “How many Rs are in Raspberry”. The future is probably neither total automation nor total irrelevance. It is a slower transition where AI increasingly becomes embedded into both offensive and defensive workflows, gradually reshaping the economics of cyber operations. References6 sources https://cloud.google.com/blog/topics/threat-intelligence/ai-vulnerability-exploitation-initial-access https://cloud.google.com/blog/topics/threat-intelligence/ai-vulnerability-exploitation-initial-access https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf https://arxiv.org/pdf/2203.02155 https://arxiv.org/pdf/2203.02155 https://knowledge.workspace.google.com/admin/security/indirect-prompt-injections-and-googles-layered-defense-strategy-for-gemini https://knowledge.workspace.google.com/admin/security/indirect-prompt-injections-and-googles-layered-defense-strategy-for-gemini https://deploymentsafety.openai.com/gpt-5-6-preview https://deploymentsafety.openai.com/gpt-5-6-preview https://www.anthropic.com/news/fable-mythos-access https://www.anthropic.com/news/fable-mythos-access