Neoclassical C++: segmented iterators revisited The article revisits Matt Austern's 2000 paper on segmented iterators, which proposed a two-level iterator abstraction to improve performance on data structures like `std::deque` by allowing algorithms to operate efficiently on contiguous segments. While the ideas were never adopted into the C++ standard, they have been implemented internally in libraries such as libc++ (e.g., optimized `std::for_each` around 2023) and Boost.PolyCollection. The concept involves decomposing iterators into segment and local iterators, enabling hierarchical algorithms that reduce abstraction penalties and enable optimizations like auto-vectorization. Introduction to segmented iterators The legendary STL library https://stl.boost.org/ has been an inspiration in the development of the C++ language and libraries. The understanding of the concepts and code developed by Stepanov, Lee, Musser, Austern, etc. is a treasure trove for any C++ programmer. Stepanov’s core claim was that abstraction should have near-zero overhead. He argued that generic programming, done correctly, should not impose a noticeable runtime cost abstraction penalty compared to hand-written specialized code. This was not just a hope but a design principle of the STL. Following this philosophy, in 2000 Matt Austern published the paper Segmented Iterators and Hierarchical Algorithms https://lafstern.org/matt/segmented.pdf . The paper’s core argument was that many data structures are naturally segmented e.g. deque , and generic algorithms that ignore that feature — treating every data structure as a uniform range of elements — are unnecessarily inefficient. A new kind of iterator abstraction, in which segmentation is explicit, could make possible to write hierarchical algorithms that exploit segmentation and reduce the abstraction penalty associated with standard iterators. The classic motivating example is std::deque , which is internally a sequence of fixed-size arrays. Every ++it on a deque iterator may cross a block boundary, so inplicitly std::find or std::fill has to evaluate, on every step, whether the current pointer has reached the end of the current block, an indirection that makes deque iteration measurably slower and also defeats most compilers’ auto-vectorisers. A segmented iterator could be a two-level structure, allowing an algorithm to operate on each contiguous segment efficiently for instance, calling memset or a tight inner loop on each chunk and only handle the segment-to-segment transitions at the outer level. This paper remained influential but its ideas were never adopted into the C++ standard. On the other hand, some libraries have internally developed those core ideas. Some examples:: - Around 2023, libc++ ‘s std::for each was optimized for segmented iterators,and which lead to high performance improvements there is an ongoing effort to add more algorithms https://github.com/llvm/llvm-project/issues/102817 . - Some Boost libraries, like Boost.PolyCollection https://www.boost.org/doc/libs/latest/doc/html/poly collection.html , have i nternal algorithm implementations https://github.com/boostorg/poly collection/blob/develop/include/boost/poly collection/algorithm.hpp specially tuned for their internal segmented nature. Austern’s paper explained A traditional iterator presents a flat view: increment, decrement, dereference. A segmented iterator additionally admits being decomposed into two coordinates: - a segment iterator that walks the outer sequence of segments the blocks of a deque, the chunks of a chunk-list, the buckets of a chained hash table, etc. . This iterator is not dereferenceable . - a local iterator that walks the inner range inside one segment. This iterator is dereferenceable . A hierarchical algorithm is then a generic algorithm that, when given a segmented iterator, dispatches to a two-level loop: an outer loop over the segments and, inside each segment, a simplified algorithm over a higher performance, less segmented, local iterator that could be further segmented into new, lower-level, segment iterator / local iterator pairs. Ultimately, this recursive segmentation reaches to a non-segmentable, probably highly efficient, local iterator . For a deque