Devirtualization and Static Polymorphism This article explains that virtual dispatch in C++ incurs performance costs due to pointer indirection, larger object sizes, and inhibited inlining. It describes how compilers can sometimes devirtualize calls automatically, and how developers can manually replace dynamic polymorphism with static polymorphism using techniques like the `final` keyword, whole-program optimization (`-fwhole-program`), or link-time optimization (`-flto`) to resolve calls at compile time for zero runtime overhead. Ever wondered why your "clean" polymorphic design underperforms in benchmarks? Virtual dispatch enables polymorphism, but it comes with hidden overhead: pointer indirection, larger object layouts, and fewer inlining opportunities. Compilers do their best to devirtualize these calls, but it isn't always possible. On latency-sensitive paths, it's beneficial to manually replace dynamic dispatch with static polymorphism , so calls are resolved at compile time and the abstraction has effectively zero runtime cost. Virtual dispatch Runtime polymorphism occurs when a base interface exposes a virtual method that derived classes override. Calls made through a Base& are then dispatched to the appropriate override at runtime. Under the hood, a virtual table vtable is created for each class , and a pointer vptr to the vtable is added to each instance . On a virtual call, the compiler loads the vptr , selects the right slot in the vtable , and performs an indirect call through that function pointer. The drawback is that the extra vptr increases object size, and the indirection through the vtable makes the call hard to predict. This prevents inlining, increases branch mispredictions, and reduces cache efficiency. The best way to observe this phenomenon is by inspecting the assembly 1 code emitted by the compiler for a minimal example php class Base { public: auto foo - int; }; auto bar Base base - int { return base- foo + 77; } For a non-virtual member function foo like in the example above, the free function bar issues a direct call bar Base : sub rsp, 8 call Base::foo // Direct call add rsp, 8 add eax, 77 ret However, declaring foo as virtual changes bar 's assembly into an indirect, vtable-based call bar Base : sub rsp, 8 mov rax, QWORD PTR rdi // vptr pointer to vtable call QWORD PTR rax // Virtual call add rsp, 8 add eax, 77 ret Devirtualization Sometimes the compiler can statically deduce which override a virtual call will hit. In those cases, it devirtualizes the call and emits a direct call instead skipping the vtable . For example, devirtualization is straightforward 2 when the runtime type is clearly fixed php struct Base { virtual auto foo - int = 0; }; struct Derived : Base { auto foo - int override { return 77; } }; auto bar - int { Derived derived; return derived.foo ; // compiler knows this is Derived::foo } The compiler is able to devirtualize even through a base pointer, as long as it can track the allocation and prove there is only one possible concrete type. The problem is that with traditional compilation, object files are created per translation unit TU ---compiled and optimized in isolation. The linker simply stitches those objects together, so cross-TU optimizations are inherently limited. That's where compiler flags are useful. -fwhole-program : tells the compiler "this translation unit is the entire program." If no class derives from Base in this TU, the compiler is free to assume nothing ever does, and can devirtualize calls on Base . -flto : link-time optimization. Keeps an intermediate representation in the object files and optimizes across all of them at link time, effectively treating multiple source files as a single TU. On the language side, final is a lightweight way to give the compiler the same guarantee for specific methods php class Base { public: virtual auto foo - int; virtual auto bar - int; }; class Derived : public Base { public: auto foo - int override; // override auto bar - int final; // final }; auto test Derived derived - int { return derived- foo + derived- bar ; } Here, foo can still be overridden, so derived- foo remains a virtual call. However, bar is marked as final , so the compiler emits a direct call even though it's declared virtual in the base test Derived : push rbx sub rsp, 16 mov rax, QWORD PTR rdi mov QWORD PTR rsp+8 , rdi call QWORD PTR rax // Virtual call mov rdi, QWORD PTR rsp+8 mov ebx, eax call Derived::bar // Direct call add rsp, 16 add eax, ebx pop rbx ret Static polymorphism When the compiler can't devirtualize, one option is to use static polymorphism instead. The canonical tool for this is the Curiously Recurring Template Pattern 3 CRTP . With CRTP, the base class is templated on the derived class, and invokes methods on it via static cast ---no virtual keyword involved php template