BugBash'26 Afternoon of Day 1

The article summarizes the first afternoon of BugBash'26, featuring talks from engineers at OpenAI, Jane Street, TigerBeetle, and Adaptive. A key theme from Ben Eggers of OpenAI was that while LLMs have made code generation cheap and fast, the hard parts of software development—design, discovery, integration, and correctness—remain unchanged, requiring developers to still do the "thinking" and upfront specification work. Other highlights included Matt Barrett discussing Adaptive's high-speed, low-latency Raft implementation for trading systems, which was praised by Antithesis for its reliability.

These are my notes from the afternoon sessions of BugBash'26. We had a 75 minute lunch break. Nice lunch, but there were no vegetarian entries, which made Peter Alvaro hangry. I don't blame him, I would be too. Ben Eggers, Member of Technical Staff @ OpenAI This was a fun and also thought provoking talk. The premise is "Nothing has changed about software development". Really? After the LLMs eating software like a wildfire, and particularly rocking at code generation in the last 3 months?? And this is coming from an OpenAI infrastructure engineer, who was once a 8th highest 7d token user. How come? The talk has two parts: Ok, now it makes more sense. Both of these are sensible statements. Ben followed with a couple disclaimers, that he is talking about deep narrow systems, and not about broad high surface area systems, because he has experience in the former, and didn't want to make claims for the latter. Ben is a funny guy, and refreshingly honest. He made fun of LLMs mistake in code generation through 3 popup quizzes through out his talk. Ben harkened back to Will Wilson's claim that before LLMs, 50% of the time was spent in writing code, and 50% in testing. Ben asked, hey, what happened to design, discovery, integration, and correctness? He made the case that these were the hard parts, and writing code forces people to address them: This is a long arduous process. The slowness of writing code was load-bearing When you notice your code start crossing boundaries, this exposed bad interfaces in your components. Wiring paths end-to-end helped expose missing cases. Writing tests yes this was the first thing we lost to the LLM wildfire forced expected behavior to become explicit test your interfaces Before LLMs, how fast code appeared matched how fast humans could reason about it. And yes LLMs broke this part But maybe not so drastically. Tech leads already managed stochastic work, and knew how to break the problem, and manage junior engineers, and interns writing code. The job was always narrowing distribution, and turning a large spectrum of possible outcomes into a tighter reliable band. Models crossed a usefulness threshold ~3 months ago. This point kept coming up in many of the talks. Some people claimed it happened it November-December, but everyone--except Gary Marcus-- agreed a corner was turned. But the code you get back is proportional to the leg work you do. Models write better code when you do the leading: tell them what success means, give them the shape of the solution, determine the behavior up-front. You need to be incredibly specific in your prompt, and in the limit prompts become math-like You mean TLA+ specs? So Ben recommends: Unit testing is kinda dead, LLMs do a great job of that. But always implement tests in a different context. Write interfaces, write tests, and tell the LLM to not to touch the test, and ask it to write the code. This is... exactly like managing an army of interns. So, Ben claims, nothing changed overall. Code got cheap, but correctness did not. You can outsource your coding, but you can outsource your thinking/understanding. Ron Minsky, Co-head of Technology @ Jane Street I didn't take much notes in this talk, and took some headspace time in the second part. Chaitanya Bhandari, Distributed Systems Engineer @ TigerBeetle Chaitanya is really smart and gave a decent talk. Again I didn't take much notes, and hit the hallway track. Waking up at 4am to fly in the same day from Buffalo to DC took its toll on me. Matt Barrett, Founder & CEO @ Adaptive The Adaptive company helps moving a lot of cash around the globe. Stressful work. Matt talked about what he claims is the world's fastest in terms of low-latency Raft implementation. This was built 8 years ago. Antithesis tested couple months ago said it is one of the most reliable Raft implementations they saw. It's used for trading systems/infrastructure. It supports 100K transactions/sec, and provides low double digit microseconds with low variance. Aeron cluster, their fast and fault-tolerant Raft, builds on opensource Aeron as a low latency high throughput messaging layer. It is based on individual byte replication, not message replication They moved from a message index to a byte index, with natural batching at all levels. Matt said business logic runs in the cluster for low latency. I don't know what he means exactly by that. But, why didn't we hear about this Raft implementation before? Also the talk did not mention any protocol innovations. It looks like there isn't much protocol level innovation at the distributed systems level or algorithmic level, and the innovation may be at the lower layers, at data handling and networking implementation. I still don't have a good idea of their Raft implementation after the talk. Corwin Coburn, Uber Tech Lead, Parallel File Systems @ Google The point is you want to keep storage boring. Storage is about writes and reads. It is a utility, and, hence, is boring. Nobody calls the plumber when things are fine. They built at Google the fastest luster filesystem in the world with 10 TB/sec. Parallel file systems in the cloud requires capacity, performance, availability, security, ease of use. They put a lot of effort to keep the storage boring by building on reliability infrastructure, proven software lustre , strict tenancy isolation, providing limited configurations, and achievable SLOs. This part is important do not overpromise, and not overdeliver If you overdeliver, and customers get accustomed to it Hyrum's law , when you go normal, that breaks the customers. Since this conference care a lot about testing, what about testing at scale? Innovation << Reliability << Inertia This is also a big tenet of keeping it boring. Nobody rewrites their applications to use your uber/super API. It has to be boring, remember, utility is boring. A tip for the developers. When you don't use storage properly, it performs badly. Most developers don't know how to use storage properly. One important thing is: don't use filesystem metadata for query intensive operations.