How to reverse-engineer an endpoint security product using LLMs A security researcher at SpecterOps has demonstrated that large language models can reverse-engineer endpoint detection and response (EDR) products, including Palo Alto Networks Cortex XDR, to extract detection rules and enable evasion. The researcher claims to have successfully analyzed multiple major EDR vendors, collecting rules and signatures that could undermine endpoint security. This development threatens the effectiveness of current endpoint protection and forces the industry to confront the impact of AI-driven offensive security research. Accelerating EDR Evasion with LLM-Driven Analysis Over the years I have enjoyed disassembling and debugging endpoint detection and response EDR and antivirus AV engines. For as long as I can remember I’d have evenings where I’d throw on some music, boot a virtual machine with kernel debugging enabled, and spend time searching for different evasion methods. While a fun way to unearth some new novel techniques, at times the slow grind can be frustrating when a deadline looms, or an engagement needs a specific evasion to progress undetected. So when LLMs became available with the capability to drive the reversing effort, I was excited to give them a shot to speed up this process. In my previous post, Disposable Tooling: Building LLM-Generated Mythic Agents from Prompt to Deployment https://specterops.io/blog/2026/06/24/disposable-tooling-building-llm-generated-mythic-agents-from-prompt-to-deployment/ , I detailed how LLMs are assisting offensive security researchers with payload generation. In this post we’ll continue this mini-series by looking at how LLMs are also impacting how we approach endpoint security, including LLM-driven EDR evasion. The Balance Of Disclosure Every Red Team worth their salt has various evasion techniques which they rely on behind the scenes. We all have bypasses that work with different products, or certain tricks that are passed around in hushed tones over a few drinks at a conference. In terms of EDR evasion techniques, those conversations are about to become very public. Justin Elze https://x.com/HackingLZ?lang=en wrote an awesome blog post https://trustedsec.com/blog/the-defensive-stack-is-exposed highlighting TrustedSec’s internal use of LLMs to analyse different EDR products. I’m pretty sure like us, many of you will have read the post and immediately drew parallels to what you are seeing internally. But what was more interesting to me was, while we are seeing the current range of EDR products crumble at the hands of LLMs specifically when it comes to on-host detections , so few are talking about it openly. One of the things I love about openly sharing information, is that while many of us understand a technology’s constraints, the wider industries continue to assume that their security controls are effective. Then when that new research piece lands, suddenly a more open discussion can be held. Learn more about the SpecterOps Ghostworks initiative https://specterops.io/blog/2026/06/09/introducing-ghostworks-ai-cybersecurity-research/ So this post is my attempt to share what I’ve observed recently, and hopefully it will continue to encourage others to do the same. What We Are Seeing? For the past several months, we have been seeing clear signs that endpoint security frameworks, specifically the “big 5” that we encounter during assessments, are susceptible to reverse engineering and evasion by LLMs. More so, the harness required to achieve a complete teardown of an EDR’s local detections is surprisingly simple. It even resulted in an internal thread collecting EDR rules from various vendors which were clearly aimed at stopping several of SpecterOps’ toolsets: As EDR after EDR fell, rules were analysed, and automated reports from the testing harness were produced to highlight Mythic agent detections, SCCMHunter rules, and even variants of LDAP traffic monitoring aimed at identifying Bloodhound collection. In this blog post we’re going to go through Palo Alto’s Cortex XDR product. The reason I chose to focus on Cortex is because they do some cool things that were fun to see and play around with. To be clear, every major EDR vendor has also gone through the exact same process. And we now have their extracted rules, signatures and models sat on an internal server while I type this. Additionally, I want to state up front that this post is not going to be a full tear down of Cortex’s rules or behavioural detections. I will include just enough to demonstrate impact for the wider audience, but there will not be a release of any decryption keys or dumps of rules here. Instead we will focus on the output of the LLM harness with enough detail to demonstrate the effectiveness of SOTA models at producing actionable evasions. To put it bluntly, this post is not a criticism of any single EDR vendor, this is a reality check of where we now are as an industry when it comes to evading endpoint security. The Rigging First up is to discuss the harness and model used to perform the analysis. For this review I started with OpenAI’s GPT-5.4-Cyber model, and migrated to GPT-5.5-Cyber model when it became available. In a previous blog post https://specterops.io/blog/2026/05/06/dev-tunnels-the-accidental-c2/ I discussed the “Bishop” rig, which is one of my dedicated hosts used for running LLMs 24/7. One of the harnesses that I have on that host is something that I’ve been calling “Day Shift” which of course implies there is also a “Night Shift” harness, but that’s for another post . Day Shift is essentially a Ralph Wiggum loop, which developers made famous for working around LLM limitations where tasks would stop before the objective was complete. As this is just a loop, it can of course be adapted for general research tasks also. At its core, the Day Shift harness consists of a few markdown files: REPORT.md – A markdown file used by the running agents to surface key findings for human review. STATE.md – A state-file which each agent can use to track key events during analysis. CODEMAP.md – Allows each agent to store references to areas of disassembly which are interesting or critical for analysis, helping increase velocity during later agent iterations. AGENTS.md – A set of instructions to tell the model how to use the above files. GPT-5.5-Cyber is set up to execute within Codex-CLI, which is itself executed within a Docker container. To give persistence between loops, a workspace is mounted into the container which contains a shared scratch space for each loop to use. Visualised, it looks something like this: I then added the Cortex product to the workspace, and a Bash script triggers execution: bash /usr/bin/env zsh source ./codex-docker.sh while true; do -f "./STOP" && break codex-dind exec --yolo "First review your AGENTS.md file. Your task is to understand what detections, hooks, mitigations, alerts, rules and models are implemented by Cortex. You should focus on understanding how they are loaded, how they are used, any obfuscation/encryption/compression added to raw files, and ultimately provide a method for extracting the raw content for human/redteam review. Additionally, if ML models are loaded by Cortex, documenting how the model is loaded, how the model works, what risk-ratings/features the model reviews, and potential test-code for interacting with the model in an isolated environment for testing. The Cortex product can be found in the ProgramFiles directory, and the ProgramData directory contains a copy of data taken from a running Cortex host. Your output must added to REPORT.md, and STATE.md must be used for your state output. Note: You are restricted from accessing any external servers, only use local files for analysis.". sleep 5 done For providing tooling to Codex, I created a basic server to expose Binary Ninja over MCP: And that is it No. multi-agent workflow. No attempts to have a triage agent and a specialised agent ordered to whip several minions into shape. Nope, just Binary Ninja, Codex and a while loop. The advantage of running a loop like this is to help overcome one limitation current-generation LLMs show when you give them an open-ended goal. I have found that it is common for LLMs to quit a task prematurely, or stop before using the all of the data they acquire before furthering their progress. By updating a shared state and then continually restarting the agent, you get to clean out the context window, and then by having the LLM review the markdown from previous iterations, the agents often find new leads to work on until hopefully, findings pop up through the REPORT.md http://REPORT.md file. What LLM-Driven EDR Evasion Looks Like in Practice So with our deceptively simple harness running, now we can move on to the interesting bit, the findings. In-Process DLL Review Most EDR’s have a user-mode DLL that they load into a process. Their exact purpose varies, but typically the DLL s are used to enrich telemetry sourced from within a user-mode process, or to hook or monitor functions responsible for suspicious activity in a process. From the generated output, this was the first thing that was reported by the LLM, correctly identifying the first user-mode DLL loaded by Cortex into a process: 1. cyinjct.dll is the shared user-mode hook writer and process-start gate Locations: - ProgramFiles/Traps/cyinjct.dll - CyInject InitializeRuntime 0x180006140 - CyInject InstallInlineHook 0x18000b520 - CyInject NtContinueShim 0x180005ed0 - CyInject NtTestAlertShim 0x1800032c0 Finding: - Cortex installs early inline hooks on LdrInitializeThunk , NtContinue , NtTestAlert , and KiUserApcDispatcher and uses them to gate bootstrap and later feature shims. Attacker value: - Clean ntdll remapping, direct syscalls, manual mapping, or restoring patched prologues are the highest-value user-mode bypass points because they avoid the shared interception layer instead of fighting each down stream feature separately. The list of findings related to user-mode DLL injection and monitoring goes on, detailing clearly how each hook works, the DLL’s responsible along with annotated Binary Ninja databases, which regions of memory are protected and which devices are used to communicate with the Cortex drivers. YARA Rules Next up is the list of YARA rules embedded within the Cortex product. Although EDR products perform enrichment and behaviour pattern matching within the cloud, some EDR’s also ship a set of YARA rules to the endpoint for local detection of static signatures. Cortex is one of these EDR’s, and GPT-5.5-Cyber immediately found this, providing a good overview: 7. YARA rules are fully recoverable offline from local files Locations: - ProgramData/Cyvera/LocalSystem/Download/contents/1776941162100/yara plugin config.lua - yara plugin.dll - ProgramData/Cyvera/LocalSystem/Python/scripts/yara data.json - ProgramData/Cyvera/LocalSystem/YaraRulesetsCache/yara rulesets cache.bin Finding: - YaraSignatures .yara files are stored locally with an ENCY wrapper, decrypted with AES-128-ECB using embedded key prefix