Introducing Neptune: Direct3D virtualization for QEMU A developer has created Neptune, a Direct3D virtualization extension for QEMU's virglrenderer that transports Direct3D APIs over Virtio GPU, enabling para-virtualized graphics for Windows applications on Linux hosts and guests. The project currently supports DirectX 11 through DXVK on Linux systems and has demonstrated performance comparable to native DXVK in benchmarks including 3DMark Fire Strike and Civilization VI. The developer used AI tools to overcome the complex technical challenges spanning GPU virtualization, Windows kernel, and graphics APIs that had previously prevented implementation. For many years, I had wanted to bring Direct3D virtualization to QEMU. I have tried and given up multiple times because the problem felt intractable. A proper solution required expertise in many niche areas of system design including: virtualization, GPU, Windows kernel, graphics APIs, etc. Each of these topics is deep enough on its own to build an entire career. Now with AI agents getting better each week, I decided to give it another shot. I know using AI is seen as an albatross in some circles. There are loud https://www.entrepreneur.com/business-news/ai-ceo-says-software-engineers-could-be-replaced-in-months/502087 rich https://www.techradar.com/pro/nvidia-ceo-predicts-the-death-of-coding-jensen-huang-says-ai-will-do-the-work-so-kids-dont-need-to-learn AI CEOs https://www.windowscentral.com/software-apps/openai-sam-altman-ai-will-gradually-replace-software-engineers who say that programmers will completely be replaced by AI. Meanwhile, concerned https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/ open source maintainers https://www.jeffgeerling.com/blog/2026/ai-is-destroying-open-source/ talk about low quality slop being spammed in projects that create extra work for maintainers already stretched thin. In fact, we have experienced this increased stream of low quality submissions to UTM with one egregious example of a “security researcher” submitting a “bug report” https://github.com/utmapp/UTM/issues/7695 as a publicity stunt for their new company without taking any time to understand the project’s threat model or trying to communicate with the maintainers before publishing. However, my belief is that AI tools should not replace human thought but instead be used to amplify creativity. AI can complement the gaps in my knowledge and I can direct it with my systems experience. I want this post to both be a technical document describing the history that led up to Neptune as well as a case study on how AI was used to bring that idea to life. Introducing Neptune Neptune is an extension of virglrenderer https://gitlab.freedesktop.org/virgl/virglrenderer for transporting Direct3D APIs over Virtio GPU, the device used by QEMU to provide para-virtualized graphics. It joins https://www.collabora.com/news-and-blog/blog/2025/01/15/the-state-of-gfx-virtualization-using-virglrenderer/ vrend OpenGL , vDRM Linux DRM , and Venus Vulkan as a protocol which virglrenderer can speak. Currently, Neptune only works on Linux host DirectX 11 through DXVK and Linux guest. This was a choice made to simplify the bring-up process and to get early feedback from the community. The next phase is to add macOS host support and Windows guest support. After that, Neptune will be extended to support DirectX 12. Results I want to start with the results to highlight what is already working. The following benchmarks were selected: 3DMark Fire Strike : a classic D3D11 benchmark which does optimised rendering Unigine Heaven : an older benchmark that is one of the few games that feature heavy tessellation Final Fantasy XIV Dawntrail : features submit bound workload Civilization VI : one of the few games that features multi-threaded rendering The main reason these games were selected though is because they are all free and I already own Civ 6 and each has a clear score which can be compared to native DXVK running on Venus . This is important to make sure the performance is within expectation. While correctness can be checked visually, it is more difficult to check performance. Thanks to Venus though we have a lower bound on what is “acceptable” performance. When DXVK is used as the Wine D3D back-end, DXVK does state tracking and creates a combined Vulkan command buffer. The Venus driver sends the command buffer to the host to execute on the GPU. Neptune does not do D3D11 state tracking and passes the API calls directly to the host. This means that there is a lot more traffic over the ring buffer shared by the guest and host. Therefore, I had expected a small but non-negligible drop in performance when running Neptune vs DXVK+Venus. However, I was surprised to see in each case, the performance increased . Here is Claude’s theory as to why: Neptune moves DXVK from competing for 4 guest vCPUs next to the game to the host’s many cores running against native radv with zero Vulkan-side virtualization, and that win dwarfs the extra ring chatter whenever GPU work or DXVK CPU work — not app-thread work — is the actual ceiling. DXVK is CPU-heavy. Hazard tracking, render-pass building, descriptor-set diffing — all hot. Moving that work out of the guest VM is a big deal even before you count the Venus saving. 3DMark Fire Strike Unigine Heaven Final Fantasy XIV Civilization VI The test machine used with a 2018 Intel NUC Hades Canyon running AMD Polaris graphics. QEMU is run on Ubuntu 24.04 with -accel kvm -cpu host -smp 4 -m 16G . For reasons explained above, the performance differences should not be extrapolated to mean the Neptune will outperform Venus in general. Instead, the takeaway should be that the performance is no worse than Venus in typical workloads and that is an important data point to have once we start porting Neptune to platforms where it will not be as easy to get side by side comparisons. Alternatives Before going into the details on Neptune’s design, I want to detail the history of my previous attempts at bringing accelerated graphics to QEMU. This highlights the difficulty of the problem at hand. VirtualBox The first idea was to port VirtualBox https://github.com/VirtualBox/virtualbox/tree/main/src/VBox/Devices/Graphics ’s SVGA device acceleration to QEMU. VirtualBox has DirectX 11 https://www.phoronix.com/news/VirtualBox-7.0-Released support through DXVK and QEMU has base support for the VMWare device that VirtualBox emulates. Both the device and Windows driver are open source. The big win here is the Windows WDDM driver https://github.com/VirtualBox/virtualbox/tree/main/src/VBox/Additions/win/Graphics/Video/disp/wddm which in theory we can take as-is. This is one of the few open source implementations of Direct3D that exists and may still be good reference material in the future. However, the complexity of fitting the device into QEMU required a lot of work back when I first looked at it, there was no AI coding assistants to help . There may also be unforeseen issues bringing the device to ARM64 guests and also I did not want to maintain a large amount video device code in QEMU. gfxstream gfxstream https://github.com/google/gfxstream previously: Vulkan Cereal is the Vulkan backend for Google’s Android Emulator and some other maybe abandoned projects . In theory it can be paired with DXVK to get DirectX support and I was informed by a very smart individual that they had an easy time porting guest support to Windows they sadly cannot release the source code because of legal issues . This seemed like an attractive path because Android Emulator is based off of QEMU forked from a much older version and more recently, support was added to upstream QEMU as well https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg03792.html . The remaining work was to port the host side to macOS which I worked on https://github.com/utmapp/gfxstream . I got pretty far with this idea: I was able to get kmscube to render on a Linux guest with a macOS host. But in the end there were a couple of blocking issues. First, although Android Emulator supported macOS hosts, the Android graphics stack is completely different from Linux I am not even thinking about Windows yet . The Linux Mesa driver was pretty new and still unstable. Second, the Vulkan driver on macOS, MoltenVK https://github.com/KhronosGroup/MoltenVK does not have all the features to implement Zink https://docs.mesa3d.org/drivers/zink.html , the OpenGL implementation over Vulkan. Why do we need Zink? Because most popular Linux distros i.e. Ubuntu LTS uses GNOME window manager which uses Mutter for compositing and the accelerated path uses Cogl… which is OpenGL based. Did you follow all that? The short version is that the current gfxstream guest driver paired with MoltenVK running on macOS host does not support hardware accelerated window compositing for Ubuntu LTS. I also looked at forgoing Zink and using a different OpenGL driver but that was not something that is currently supported and would require both work on the guest driver side something I wanted to avoid in the first place and on the macOS host side. For example, if I used the existing OpenGL forwarding path in gfxstream supported by Android guests , I would have to bring up a Gallium driver for it. If I tried using the VirGL driver which is already supported by QEMU on macOS , then I need to plumb compositing gfxstream and VirGL contexts together through virglrenderer which also required QEMU level changes. Overall the complexity of all the changes needed quickly added up meaning maintenance costs as well and this is all before even considering what to do on the Windows side. Gallium for Windows Let’s put aside getting a modern graphics stack Direct3D 11 or Vulkan working on Windows and focus instead on getting any accelerated graphics on Windows. For more than a decade, there has been multiple https://github.com/Keenuts/virtio-gpu-win-icd attempts https://studiopixl.com/2017-08-27/3d-acceleration-using-virtio to get a VirGL guest working on Windows. The idea here is that virglrenderer is already a mature protocol for transferring Gallium between guest and host so we just need to implement the guest driver for Windows. Once virtio driver for Gallium works, then OpenGL acceleration is possible. But not just that, Direct3D 10 can also be supported thanks to an existing frontend currently only for software rendering . Quick detour of terminology for those who are less familiar with the Linux graphics stack: - Mesa: started as an open source implementation of OpenGL APIs and then expanded to GLES, EGL, then Vulkan, and even D3D9/D3D10 swrast only . Mesa handles the complex state tracking required by different graphics APIs and translate them to Gallium. - Gallium: various legacy graphics APIs OpenGL are translated into the Mesa internal API, Gallium. Then various downstream drivers implement Gallium for different hardware. The advantage is that driver developers can target one API set instead of half a dozen. virtio is one such driver which serialises Gallium calls across the virtualization boundary. - virglrenderer: a library for deserialising Gallium commands and implementing them with OpenGL. In a way you can think of it as an inverse of Mesa. Why do this instead of just transferring the OpenGL calls directly over the wire? I can only speculate but my guess is that you do not need a separate OpenGL driver in the Linux guest avoiding multiple OpenGL libraries is one of the reasons why Mesa exists and the guest can also be agnostic to what APIs are actually supported on the host. In fact we take advantage of this second point with UTM on iOS. We use the Google project ANGLE https://github.com/google/angle , a GLES implementation on top of Metal the Apple native graphics API . virglrenderer supports rendering with GLES and therefore we can support desktop OpenGL 2.1 on Linux guests without ever needing to implement it on iOS. So in order to bring OpenGL to Windows, we just need to implement a Gallium driver on Windows. Luckily, software rendering already exists for Windows so the only work is to communicate with a kernel mode virtio driver in order to efficiently transfer buffers from guest to host. Indeed, max8rr8 https://github.com/virtio-win/kvm-guest-drivers-windows/pull/943 started work on this three years back and was able to get it to a proof-of-concept state. This was very impressive work because as stated earlier, knowledge of Windows WDDM drivers is few and far between. Having looked extensively into this topic myself, I am disappointed by the opaque nature of the MSDN documentation and the dearth of public discussion in places like community.osr.com. It feels like most knowledge is siloed between Microsoft and the three large chip vendors. However, as monumental of an achievement as max8rr8’s driver was, there was little guidance from the corporate backed maintainers of the project to help the new contributor upstream their work and the effort fizzled out. I expressed my disappointment https://github.com/virtio-win/kvm-guest-drivers-windows/pull/943 issuecomment-4065367940 but unfortunately this is a common theme in the big tech takeover of open source. Not all hope is lost though because as of recently, work has continued from two other contributors. anonymix007 https://github.com/anonymix007/kvm-guest-drivers-windows-venus and arehnman https://github.com/arehnman/kvm-guest-drivers-windows have both independently picked up where max8rr8 left off to bring not only Gallium support to Windows but Vulkan as well. Which brings us to… Venus Venus is another project maintained by Google and is part of virglrenderer. Vulkan support on Linux is also handled by Mesa but not through Gallium. That means virglrenderer needs to have a separate back-end for transferring Vulkan commands over the virtualization boundary. Whereas Gallium commands are transferred directly over the virtio-gpu device through the virtio transport with the VIRTIO GPU CMD SUBMIT 3D command, Venus is handled “out of band” by a shared buffer of memory between the guest and host which acts as a ring buffer for serialised commands to be written to. This improves latency and allows for more efficient batching of commands. This is very similar to how gfxstream works as well, but one advantage of Venus is that there is already support for import/export of Vulkan resources into the vrend OpenGL context. This means that a Cogl based window compositor on Linux can display a Vulkan window without any CPU side buffer copying. Ultimately, we were able to port Venus to macOS and iOS https://github.com/utmapp/virglrenderer in the latest UTM beta https://github.com/utmapp/UTM/releases . This brings Vulkan acceleration to Linux guests without needing any guest side driver patches. Everything should “just work”. Mission accomplished, right? Not quite. Having modern graphics working on a Linux guest was a major milestone for us but our north star is full graphics acceleration on Windows. A stepping stone to that is for DirectX to work on Linux through DXVK DirectX 11 implemented on top of Vulkan and we ran into some roadblocks there. Again, much of the issues come back to MoltenVK, the macOS/iOS implementation of Vulkan APIs over Apple’s Metal APIs. MoltenVK does not support all the Vulkan features that DXVK requires. CrossOver https://www.codeweavers.com/support/forums/announce/?t=24;msg=322440 a commercial distribution of Wine for macOS has a fork https://www.codeweavers.com/crossover/source of MoltenVK and DXVK that is tuned for higher game compatibility for Mac. We tried to integrate their fork into UTM but ran into various issues running Linux guests their fork was based on older versions of the project that lack support for features we need for virtualization . After merging some of their changes together with upstream MoltenVK, some games started to boot. However, there are still graphical issues in-game. Can we do better? The biggest hurdle is the fact that when we stack multiple layers of API translation, compatibility is limited to the least supported component. This is what the stack currently looks like: and for macOS hosts: Windows guests would have a similar picture. On macOS the weak link is currently MoltenVK which limits the full graphics capability of the system. In the time since Venus started working on macOS, KosmicKrisp https://www.lunarg.com/a-vulkan-on-metal-mesa-3d-graphics-driver/ came out and is steadily improving. However, it is still currently not at feature parity https://gitlab.freedesktop.org/mesa/mesa/-/work items/14209 with MoltenVK and does not currently run DXVK. To recap, for Linux hosts, DXVK + Venus + host Vulkan is a fine combination with good compatibility. However, on macOS, Vulkan is not a native citizen. Apple’s own graphics API is Metal and recently, Apple introduced the Game Porting Toolkit https://developer.apple.com/games/game-porting-toolkit/ which comes with a framework called D3DMetal. D3DMetal translates D3D11 and D3D12 directly to Metal and is already used by CrossOver https://www.codeweavers.com/blog/mjohnson/2023/9/27/crossover-235-is-a-real-game-changer to gain support for games previously unplayable even on their tuned DXVK. If we can integrate D3DMetal with virglrenderer and we remove DXVK on the guest side and transfer D3D commands directly over VirtIO GPU, then we can skip Vulkan/MoltenVK altogether. You can imagine other useful combinations: Windows guests on Linux hosts DXVK on host side or Windows guests on Windows hosts native D3D or even Linux guests on Windows hosts Wine on the guest with no DXVK . However, the scope is large enough as it is already so none of this is currently planned. In this first phase, the emphasis is to get the least useful pairing working: Linux guest no DXVK on Linux host DXVK . Why this pairing? Because this is the most mature and fleshed out starting configuration. It also provides a quick way of checking results by comparing directly to DXVK + Venus as seen in the results above . Implementation With the “why” out of the way, the next part will detail the “how.” This is the first large project I’ve worked on where I heavily depended on AI tools specifically Claude Code . To give more insight on how we worked together, I asked Claude to analyse our chat transcripts to provide insight on how the collaboration worked. Neptune is a GPU virtualization back end that lets Windows D3D11 applications run inside a guest VM and render on the host's GPU. It is parallel to Venus which does the same for Vulkan : the guest serialises COM method calls into a shared-memory ring buffer; the host deserialises them and runs them through dxvk D3D11→Vulkan ; rendered frames come back to the guest as dma-bufs and are presented through X11 DRI3. The work spanned four tightly coupled repos: | Repo | Role | Hand-written code touched | Commits | |---|---|---|---| mesa/src/virtio/neptune/ | virglrenderer/src/neptune/ neptune-protocol/ forked https://github.com/osy/dxvk dxvk/ Neptune is not a "rewrite the world" project — it's deliberately a Venus clone with D3D11 substituted for Vulkan. That structural similarity is load-bearing for everything that follows: most of the time, the right answer to "how should we handle X?" was "look at Venus." I had a general design direction in mind but the amount of code that needs to be brought up, tested, and optimised was daunting. My own estimates for the amount of work without any AI tools was 6-8 months this includes time needed to learn new components . From the previous experiences bringing up VirGL on ANGLE, gfxstream, and Venus, I knew that the bulk of the work will be in debugging. The debugging task is painful because often you are crossing both user/kernel boundaries as well as guest/host boundaries. Issues with incorrect drawing is especially difficult because it involves the GPU as well. In the past, I’ve had to attach GDB in the guest and LLDB in the host to step through a single draw command. Combine all this with the multiple layers of API translations, cache coherency, and race conditions that disappear upon any profiling and you find yourself spending weeks upon weeks tracking down bugs. I want to put the AI in the best possible position to debug issues without my intervention so I came up with the following rules. - The project needs to be broken down into multiple parts. Each part needs to be broken down into smaller goals. Each goal should have a clear pass condition and that condition needs to reflect the complexity of the next goal. - The overall design needs to be anchored to an existing design that is already proven to work. In this case, that is Venus. Anchoring to a working design allows the design space to be more constrained and helps avoid issues where the AI picks a sub-optimal implementation choice due to not considering a better option. - The AI should be able to get feedback on what the “correct” behaviour is without consulting the human. Knowing that DXVK is in a good working condition on the test machine is crucial for this. The numbers | Metric | Value | |---|---| Tokens read from cache ≈ 7× The Lord of the Rings per minute, sustained over 32 days | 22.5B | | Tokens written by Claude | 53.6M | | Tool calls Bash, Read, Edit, … | 34,023 | | Sessions 80 primary + 228 subagent | 308 | | Hand-written-code commits across 4 repos | 657 | | Lines of hand-written runtime C code mesa + virglrenderer | 22.9K | The timeline I discovered with great dismay https://simonwillison.net/2025/Oct/22/claude-code-logs/ that Claude Code automatically deletes transcripts older than 30 days. Therefore this analysis does not include the first month of work starting March 8 . As such, I will give a brief summary here. I spent the first day a whole day crafting this initial prompt https://gist.github.com/osy/e0072beec74dfb3bbf1ed6ebc689ddce . I wanted to give a high level structure of the work without being too prescriptive. I have noticed that Claude Code’s plans are usually best if you accurately describe WHAT you want without too many details on HOW you want it. If you don’t like the plan, you can always discard it and try again but I have found that Claude’s plans are usually pretty solid. That being said, a major mistake in this first iteration was under-specifying how I wanted the command serialisation to be done. I had a vague idea that “gfxreconstruct has some way of doing it so you can steal its homework.” However, that ultimately did not work because it differed too much from how Venus handled serialisation that it made everything else a battle upstream. It was Claude who came up with the idea of first parsing the SDK header files into a JSON database and then generating serialisation functions from that database. This decoupled the “header parsing” from the “code generation.” I liked this idea but thought it can be improved by looking at the .idl files that Microsoft provides rather than the .h files which themselves are generated from the MIDL. Claude wrote a Python script that parsed the MIDL files into JSON but upon testing, there was mistakes all over the place. It was worth taking yet another step backwards and focus on just parsing MIDL in isolation. The end result was midl-classic https://pypi.org/project/midl-classic/ , a Python parser for MIDL which converts the MIDL into an AST. I gave Claude a copy of the MIDL specification documents that Microsoft publishes and a TypeScript syntax highlighter that Microsoft open sourced and asked Claude to implement the parser fully to the specs. With the MIDL parser in hand, the next step was to design neptune-protocol https://github.com/osy/neptune-protocol which uses midl-classic to convert the SDK MIDL files to a JSON registry, combine it with a manual overlay, and then generate code from it. Claude was asked to use venus-protocol Vulkan has the APIs already in machine friendly XML format as inspiration to create the protocol generator for serialisation. One issue from the first failed attempt was that it was difficult to check the correctness of the generator because there is no easy way to say “does the generated structs match the layout in the SDK headers.” You can ask the AI for that but you will never know if it looked at every structure or didn’t make any mistake. Instead you need to ask it to create tests to exercise all the functions and collect coverage data on it. Once the protocol generator was working, Claude was asked to re-design the virglrenderer code to use the new generator. This was a massive rewrite and really demonstrates one benefit of AI coding: large rewrites and refactors are now cheap which means the opportunity costs of trying one way, figuring out you went the wrong direction, and starting over is no longer days of work but just hours of work. Next, we extended DXVK-native to support a headless WSI that exports dmabuf instead of rendering the frames directly to screen. The advantage of doing this is that virglrenderer already understands dmabuf as it is used in Venus and other back-ends. dmabufs are also a cheap way to move data from host to guest without needing to copy data from GPU to CPU each time. Initial smoke tests were also created: a single static triangle and a spinning cube. No matter how hard I tried, I was not able to get Claude to understand what a cube looks like. It gave me some kind of spinning geometry with 6 sides but it definitely was not how a cube would look in 3D space. My own lack of background in 3D graphics means that I was not even able to describe to Claude what the issue was and when I attempted to give it screenshots of frames it was also unable to glean any insights. In the end, I gave up because it doesn’t matter for the future tasks if the 3D cube was indeed a cube. It just had to have geometry and animation. The guest side was uneventful. There was some initial struggle trying to get Unix syscalls like SCM RIGHTS to work through Wine. Claude kept coming up with progressively more complicated and fragile hacks until I gave it the Wine source code where it was able to figure out both how Unix libraries are loaded and how to call into them. At that point, we were able to get the triangle and cube to render through vtest a test back-end that talks directly with virglrenderer on the same host without QEMU or KVM . Then we set up a VM, copied the built libraries to it, and got the smoke tests to work across the VM boundary as well. All of this was straightforward thanks to existing Venus code which can be used in Neptune here mostly unmodified because the transport layer is essentially the same. A condensed narrative of what actually happened, drawn from the first user message of each major session: | Date | Phase | What happened | |---|---|---| | Apr 14 | bringup | “We finished implementing virtio transport for virglrenderer and mesa Neptune. Now it's time to test it on a real game.” — first session of the test era; Crash Bandicoot N. Sane Trilogy is the target. | | Apr 15 | bringup | Apitrace setup, Wine integration, win32 handles for events/fences. First real crash debugging. B/R channel swap mystery. | | Apr 16 | review | First “code reviewing the newly implemented Neptune backend” pass — Venus is the reference, every divergence is suspect. | | Apr 17 | perf | First gameplay-aware perf analysis. Discovery: encode + reply waits dominate. Plan for “Venus-style per-thread encode batching.” | | Apr 18 | perf | The TLS-ring saga. Multi-ring stalls. UAF bugs surface only when multi-ring is on. “Keep going and don't stop until you are able to run the game 10 times, each for 5 minutes without any hang/freeze/crash/deadlock.” By end of day: 2.37× throughput on Crash. | | Apr 19 | perf | Texture-map fast path P4 → P1 → P2 → P3 . Heuristics for multi-ring default-on. Wider games analysis. | | Apr 20–21 | review | COM-type cleanup. Wrapper consolidation. Override macro work. | | May 1–2 | review | Wine-only consolidation drop native-Linux paths . Future-fence feedback. Comments cleanup: “all the comments in the Neptune backend has been written by Claude. Much of it is too verbose, duplicated, or useless.” | | May 3 | debug | dmabuf WSI rearchitecture and 5-hour freeze hunt. Root cause: 32-bit seqno wrap in npt ring seqno status . The biggest single session in tool calls 4,059 . | | May 4–5 | perf | Native-DXVK vs Neptune deep comparison. WC + ring-ordering puzzle. memcpy attribution drama it was in game.exe all along, not in Neptune . | | May 7 | review | Big code review pass with Venus parity checks. /loop-driven iteration. Profiling Wine library code. | | May 8–9 | perf | Apitrace integration crash fix. Frame-pacing rewrite. xcap custom capture tool. vtest-vs-wine perf parity achieved. | | May 10–11 | debug | Lockless seqno fast path. FFXIV Dawntrail and 3DMark Fire Strike bringup. New games expose latent bugs immediately. | | May 12–13 | debug | Out-of-order present FIFO bug the “blink” visible to the eye but not in dmabuf capture . Variant analysis on protocol generator bugs. | | May 14–15 | debug | 3DMark termination stall sc wsi stop 's INFINITE wait . “Stop hook”–driven autonomous fix loop. 588 s → 421 s on a single test. | | May 16 | cleanup | Squash to upstream branch. This report. | Three things are worth pointing out about the shape of this timeline: It is not a feature-build curve. Most of Neptune's runtime code existed before Apr 14 from earlier sessions on a different machine. What we see here is the much harder phase: turning a thing that compiles into a thing that actually runs a 3D game , and then a thing that runs well . Review and debug dominate. Of 39 major sessions, 15 were code-review/refactor, 9 were performance work, 6 were bug-hunts, 1 was the bringup, and the rest mixed. Bringing the code up to quality took more iterations than writing the code . Each new game broke something. Crash worked first; then FFXIV exposed protocol-generator NULL-derefs; then 3DMark exposed the WSI-thread shutdown stall. The bug rate didn't trend toward zero — it shifted into rarer corners each time. Where Claude excels Several capabilities showed up over and over in the transcripts and are easy to point at concretely. 1. Long-running autonomous debug loops with verification Given a clear pass condition, Claude can stay on a problem for hours: form a hypothesis, instrument, run, read logs, revise, repeat. The single best example is the multi-ring stall hunt on April 18. Here is the actual prompt: That kicked off ~10 hours of autonomous work. Claude implemented per-instance rings, hit a UAF, debugged it via gdb attached to the renderserver, rebuilt, ran ten validation runs in a row using the Monitor tool, and reported back: 10/10 Crash Bandicoot 5-minute runs all passed — every run reached LOADING LEVEL: 'Crash1/C1 StartScreen/C1 StartScreen' with zero watchdog, ring fatal, decoder fatal, reply mismatch, or hang. … And another, after gdb work on the same day: npt d3d11 buffer rotate slot waited on dev- ring for a seqno that was recorded on dc sc ring — classic wait-on-wrong-ring bug. Winedbg backtrace once I got it working via the VM's sudo access resolved to ctx Map override → rotate slot → wait seqno → Sleep , making the bug obvious. Fix: each slot now remembers the ring its Unmap went out on. Same pattern fired again a month later when a /goal stop hook was used to chase the 3DMark termination stall. The user set a verifiable pass condition “recover ≥ 90% of the 160s gap” and Claude ran through three hypotheses, instrumenting npt device destroy step-by-step until it found the right one: Achieved 168 s recovery 105 % . Timing-instrumented npt device destroy step-by-step: drain=29.5 s , everything else <70 ms … Root cause the third hypothesis was right : sc wsi stop 's WaitForSingleObject wsi thread, INFINITE was the call sitting there. The WSI worker is inside npt renderer wsi present → xcb present pixmap blocked on an X11 round-trip that won't return. The shape that makes loops work, every time: - A clear, machine-checkable termination condition “10 runs of 5 min”, “recover ≥ 90 %”, “match DXVK frame rate within 1 %” . - Permission to instrument freely add logging, attach gdb, modify scripts . - An external feedback signal that's faster than the cycle time — Monitor tailing a log, ScheduleWakeup after a perf run, PushNotification on hang. 2. Big-picture analysis from raw metrics When given dump files or perf output, Claude is consistently good at building the table-of-metrics-and-paragraph-of-prose that explains where the time goes — and crucially, it will tell the user when its own prior hypothesis was wrong. Two examples: Guest game thread 232 s wallclock : 20.7% in protocol encode, 16.2% blocked on sync replies, 63.1% outside neptune. Host ring thread: 8.8% actual dispatch work, <1% idle wait, the rest pulling from ring queues… call us — the guest's view of how long the host's IDXGISwapChain::Present takes: call us median: vtest 13,329 µs vs virtio 23 µs — 579×. And there's a stale-comment issue layered on top… The "tell the user when prior hypothesis was wrong" part shows up nicely in: What I was conflating — I said "the host has to do N D3D11 method calls per draw and that's the floor." That's wrong as a bottleneck argument because — exactly as you point out — the wine path eventually does the same D3D11 work too… This kind of in-conversation walking-back of a prior model, with an explicit why , was very common and very valuable. Where the human gave good pushback next section , Claude was usually willing and able to re-think. 3. Mechanical cross-codebase refactoring Several of the largest sessions 200–700 messages each were not bug-hunts but mass refactors: renaming a function and chasing every call site across two repos and the generator; collapsing duplicate header structs; extracting subgroups out of a 3000-line dispatcher into per-family files; bulk comment cleanup. Two stats anchor this. On May 9, a single virglrenderer session landed 22 commits renaming and restructuring the host's dispatch layer one commit per subgroup: COM, WSI, RING, RESOURCE, FEEDBACK . On May 14, a single mesa session landed 11 commits mechanically replacing npt sizeof T & const T {0} patterns with explicit max-sizing for unions only, after the user spotted the over-allocation. This is the kind of work where the “keep going” loop pays off without philosophical risk: each step compiles, each step is verifiable, the failure mode is “build broke” — easy to detect, easy to fix. Constantly circling back and doing simplifications, de-duplication, and refactoring is a necessary part of working with large amount of AI written code. Just like with human written code, as you fix bugs, handle edge cases, and add new features, the nice pristine structure of the original design rots away. As I’ve said in a previous section, refactors are technically cheap and so they should be done every so often as changes add up and you realise there is a better structure to be had. 4. Subagent fan-out for review and search Of 34K tool calls, 171 were Agent launches — almost always to parallelise file-by-file scrutiny during code reviews. The pattern looked like: “there are 50 generated dispatcher files; spawn five subagents to audit them in parallel, then merge the findings.” This kept the parent context from drowning in file contents and surfaced more issues per hour than serial reading. 228 subagent transcripts averaged ~50 messages each, totalling 58 MB. They were used for: variant analysis “find all the places that do X and check for Y” , independent code-review “here's the diff, what do you think?” , and bounded research “how does Venus handle this?” . 5. Memory as institutional knowledge Over the course of the project, 22 long-lived memory files accumulated across the four projects' memory folders. The most valuable were the “feedback” entries — codified lessons from a single hard-won moment that prevented the same mistake later. A representative one saved after the user caught Claude using pgrep -f twice in one session and getting bogus results both times : pgrep -f