Scalable Treasure Hunts Are a Myth, But We Almost Made One The article describes a team's failed attempt to scale a server infrastructure for a treasure hunt game by using a complex modular microservices architecture, which actually hindered performance. After struggling with high latency and CPU usage, they simplified the system by switching to a single-threaded, in-memory cache with asynchronous message queues, which drastically improved performance and reduced resource consumption. The author concludes that a simpler design and earlier performance testing would have been more effective, and that Python was a poor choice for the system's concurrency needs. As it turns out, we weren't really solving the problem of scaling our server infrastructure, but rather the problem of our own internal architecture. Our team had designed the system to be modular, with various components communicating with each other over a complex network of message queues and HTTP requests. It was an attempt to apply the microservices pattern to a very specific problem, but ultimately, it just wasn't working out as planned. The more we tried to scale, the more our performance metrics started to slip. We saw our average 95th percentile latency creep up to 400ms, and our CPU utilization on the web server started to top out at 80%. Our first instinct was to try and optimize the database queries on our server. We knew that the database was a major bottleneck, so we went in and started tweaking query plans, indexes, and caching strategies. We even went so far as to rewrite some of the queries from scratch, but it didn't really make a dent. Our latency started to come down, but only by a few milliseconds, and our CPU utilization kept creeping up. It was clear that the database was just the tip of the iceberg, and that our real problem lay deeper in the system. It was then that we took a step back and re-evaluated our architecture. We realized that our modular system was actually a hindrance to performance, rather than a help. We were creating unnecessary overhead with all the inter-component communication, and our system was starting to resemble a distributed monolith. We decided to take a different approach - we chose to use a single-threaded, in-memory cache to store our game state, and then use a message queue to handle updates asynchronously. It was a much simpler design, but it turned out to be the key to our performance problem. After implementing the new design, we saw a huge improvement in performance. Our average 95th percentile latency dropped to 50ms, and our CPU utilization on the web server came down to 30%. We were able to handle thousands of concurrent users with ease, and our system became much more responsive. We also saw a significant reduction in our allocation count, which was a major pain point for us. Our application was no longer constantly thrashing the CPU and memory, and we were able to remove a bunch of unnecessary caching and queuing code. In hindsight, I would have taken a more radical approach to our system design from the start. We were trying to reinvent the wheel with our modular system, when in fact, a simpler design would have been much more effective. I would have also focused on performance testing much earlier in the development cycle, rather than waiting until the last minute. Finally, I would have been more careful about our choice of programming language and runtime. We chose to use Python, which, while a great language, was not the best choice for a system that needed to be highly concurrent. We ended up constantly fighting memory safety issues and performance bottlenecks, which made our life much harder than it needed to be.