June 25, 2026
Todd Lipcon, Distinguished Engineer, Google Cloud, and Manish Purohit, Research Scientist, Google Research
Linear elastic caching minimizes total cache cost by framing page eviction as a ski rental problem, using lightweight machine learning to optimize the trade-off between memory footprint and cache misses.
Modern high-performance database systems and cloud services rely on in-memory caching to keep frequently accessed data in RAM to bypass slow disk operations and deliver the lightning-fast response times users expect. But this performance comes with a cost (literally): high-speed memory is expensive, and some serverless cloud providers charge up to $3 per day for just 1 GiB of memory.
Historically, cache management has been treated as a fixed-resource problem. With regular, fixed-sized caching, engineers allocate a specific amount of memory for the cache and the system uses eviction policies like least recently used (LRU) replacement to decide which data to keep when that space runs out. This leads to a classic “Goldilocks” problem: size the cache too small and performance plummets; size it too large for peak demand, and you waste thousands of dollars on idle memory.
In a paper published at the Conference on Innovative Data Systems Research (CIDR), we introduced linear elastic caching, a new approach designed to minimize the total cost of ownership (TCO) of cache management by dynamically adjusting cache size in response to real-time workloads. Instead of treating memory as a fixed, pre-allocated resource, we treat it as an utility whose cost is linear in both the size of the cached data and the duration for which it is held in the cache. By treating memory footprint as a variable cost that integrates over time, we showed that we can significantly reduce expenses without compromising system performance.
To solve the challenge of dynamic cache sizing, let’s use the classic ski rental problem. Imagine you’re on a ski trip of unknown length. Each day, you face a choice: rent skis for a small daily fee or buy them for a larger upfront cost and ski for free thereafter. If you knew exactly how many days you would ski, the choice would be easy. But without that knowledge, you need an algorithm to minimize your total spend.
Similarly, in a linear elastic cache, every piece of data faces a comparable dilemma. When a piece of data is accessed, the system must decide between two alternatives:
At the same time, the system cannot optimize for each piece of data independently since the cache has a maximum allocated size (think of a large group of people at a ski resort, where the resort only has a limited number of skis to offer). Our core theoretical contribution proves that we can optimize these two factors — the eviction policy and the "rental" duration — separately. This separation lends itself nicely into a clean practical implementation. We can use a ski rental algorithm to determine the time-to-live (TTL) of a page (analogous to the rental duration). If a page isn’t accessed again before its TTL expires, it is automatically evicted. But if the cache ever becomes physically full, a traditional eviction policy like least recently used (LRU) steps in to manage the space.
Traditional online algorithm design focuses on providing worst-case performance guarantees. For the ski rental problem, the classic “break-even” algorithm is to rent until the accumulated cost equals the purchase price, and then buying the skis. While this approach (and its randomized counterpart) provide solid worst-case guarantees, production workloads are mostly predictable. Data access in systems like Spanner — our globally distributed database — often follows discernible patterns that can be exploited to make better renting decisions.
To ensure our theory holds up in the real world, we conducted extensive experiments using two primary sources:
We developed a practical algorithm that assigns a time-to-live (TTL) to the cached page on each page request based on the page’s access patterns and costs. Because Spanner handles billions of requests per second, this TTL prediction model has to be incredibly lightweight. We opted for a shallow decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable insights on the workload characteristics. This model considers features such as the size of the data, the cost of a cache miss (when data isn’t in the cache and the system needs to retrieve it from some other, slower system like a disk), and the type of database operation being performed to predict the optimal TTL for each page.
We integrated the elastic caching policy into Spanner's production servers over several months. Compared to a standard fixed-size cache, the results were substantial:
Crucially, because the algorithm is "cost-aware," the small increase in cache misses was concentrated on data that is cheap to fetch from storage, meaning the impact on actual I/O costs was a negligible 0.5%.
We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the greedy dual size frequency (GDSF) eviction algorithm — a generalization of the well-known LRU policy that allows for pages of different sizes — as a fixed cache size baseline policy.
We considered four variants of elastic caching depending on which ski rental algorithm we used and whether or not we used a machine learned model. Since the available public traces don't have application-level features available for training, we didn’t implement decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we computed the best TTL for the page that minimizes the cost over the training trace.
Since the behavior of the cache changes depending on what's initially in the cache, a common practice, known as “warming up”, is to use some prefix of the cache trace to populate the cache but not actually measure performance on it. We warmed up all caches with one day’s worth of requests from the second half of the trace and used the rest for testing and measurements. During the test trace, if we encountered a page that was seen during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we set the TTL using either the breakeven or randomized policies.
We found that the elastic approach consistently outperformed fixed-size caches across diverse workloads. As the cost of memory increases relative to the cost of a cache miss, the savings provided by elastic caching become even more pronounced.
As the following figures demonstrate, elastic caching policies incur significantly lower total cost by dynamically adapting the cache size to the workload. We also observe that the elastic policies incur a much lower cache miss rate when compared to a fixed size policy at a comparable size.
Linear elastic caching represents a shift in how we think about cloud infrastructure. By moving away from static peak-load provisioning and toward a dynamic, cost-aware model, we can build systems that are both high-performing and economically efficient.
Our evaluation of these learned ski rental policies across Spanner workloads demonstrates that even small, lightweight ML models can have a massive impact when applied to core infrastructure. As cloud environments continue to offer more granular, pay-as-you-go pricing for resources, elastic strategies will become essential for any large-scale service looking to optimize its global footprint.
This represents joint work with Tamas Sarlos (Google) and Ravi Kumar (Google) and was presented at the Conference on Innovative Data Systems Research (CIDR) 2025.