Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

Developer Simon Willison ported the Moebius 0.2B image inpainting model to run in a browser using WebGPU, bypassing the original requirement for PyTorch and NVIDIA CUDA. The project, completed with assistance from Claude Code, is available as a demo at simonw.github.io/moebius-web/. This demonstrates the feasibility of running lightweight AI models directly in the browser without specialized hardware.

This morning on Hacker News https://news.ycombinator.com/item?id=48630171 I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance https://hustvl.github.io/Moebius/ , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. The released model required PyTorch and NVIDIA CUDA https://github.com/hustvl/Moebius/blob/9310b76e368f5f7a8ecdf06493231af279c9973b/requirements.txt L1 , but since it described itself as 0.2B I decided to try and get it running using WebGPU in a browser. TL;DR: I got it working, and you can try the demo at simonw.github.io/moebius-web/ https://simonw.github.io/moebius-web/ . Read on for the details. Here's a video demo of the finished tool: You can open any image in it non-square images get letterboxed , highlight areas to remove, click the "Run inpaint" button and wait for the model to do its magic. My main project for today was landing a major feature in Datasette: a UI for creating and altering tables, as a follow-up to the insert and edit rows feature https://simonwillison.net/2026/Jun/16/datasette/ I released last week. I was working on that in Codex Desktop here's the PR https://github.com/simonw/datasette/pull/2789 and often found myself spending 5-10 minutes spinning my fingers waiting for it to complete a mid-sized refactor or add the finishing touches to a change to the UI. An amusing thing about coding agents is that the harder a problem is the more time you have to get distracted while you wait for them to finish crunching So I decided to spin up Claude Code in a terminal window and see how far I could get at porting Moebius to the web. My first step was to ask regular Claude about the feasibility of this project. In Claude.ai https://claude.ai/ , which has the ability to clone repos from GitHub: Clone https://github.com/hustvl/Moebius/ and tell me if they published the code and weights to run this model anywhere I hadn't spotted the link to the weights yet, that's tucked away in the "News" section. Then: For Moebius what are the options for running it right now - Python and NVIDIA CUDA only or other options too? And: Muse on the feasibility of porting it to Transformers.js or similar and running it in a browser I like telling models to "muse on X", it's the shortest way I've found of expressing that I want them to contemplate a problem for me without providing them with a concrete goal. Here's that chat transcript https://claude.ai/share/551c3dc8-17ce-4a4b-a0c9-8cbded6c7bf1 . I copied out the last answer and saved it as research.md https://github.com/simonw/moebius-web/blob/main/research.md for Claude Code to read later. Claude suggested using ONNX Runtime Web on the WebGPU backend - the layer below the Transformers.js https://huggingface.co/docs/transformers.js/en/index library I had suggested. That was enough to convince me it was worth setting Claude Code loose and seeing how far it could get. I usually start projects like this by gathering as much information as the coding agent might need as possible. Since I didn't expect this project to actually work I did everything in my /tmp folder: cd /tmp mkdir Moebius cd Moebius Grab the Moebius python code git clone https://github.com/hustvl/Moebius And the model weights Claude figured this out : GIT LFS SKIP SMUDGE=0 git clone \ https://huggingface.co/hustvl/Moebius Moebius-weights Finally a couple of libraries we might use: git clone https://github.com/huggingface/transformers.js git clone https://github.com/microsoft/onnxruntime I created a directory for the rest of the project and ran git init in that so Claude could start committing code notes: mkdir /tmp/Moebius/moebius-web cd /tmp/Moebius/moebius-web git init Copy in that research.md from earlier git add research.md git commit -m "Initial research by Claude Opus 4.8" I fired up a claude instance in the /tmp/Moebius folder, the level above all of the research materials I had prepared for it. I prompted: Read ./moebius-web/research.md - your goal is to port this model to ONNX and WebGPU so we can run it directly in a browser, with a simple UI As it started to work I dropped in this follow-up typos included : Bulid this in /tmp/Moebius/moebius-web and commit early and often, also maintain a notes.md file in there with notes about what you figure out along the way - also start by writing out a plan.md in there and update that plan as oy work too I often ask agents to keep notes like this - the end result is often interesting, both for myself and for the next agent session that touches the same project. Here's what that notes.md file https://github.com/simonw/moebius-web/blob/main/notes.md looked like at the end of the project. I kicked it off and went back to my main project, checking in occasionally to see how Claude was doing. When it looked like it might have something that worked I prompted: Tell me what URL I can visit in my own browser to try this Then I tried it out in Chrome and pasted some errors and screenshots of errors back into Claude Code. After a few rounds of this we had something that appeared to work Time to put it on the internet so other people could use it. How would we publish this to Hugging Face such that the model weights were on there and the HTML demo would show up in Hugging Face spaces? Claude Code knows how to use the hf CLI tool, so I created a model repo on Hugging Face https://huggingface.co/ , then created a token https://huggingface.co/settings/tokens that could write to that repo and dropped it into a /tmp/Moebius/token.txt file so Claude could use it. It published the 1.24GB of converted ONNX weights to huggingface.co/simonw/Moebius-ONNX https://huggingface.co/simonw/Moebius-ONNX for me. I'd seen other demos load weights into the browser from Hugging Face before, so I knew it was possible. I decided to host my own frontend code on GitHub Pages, so I said: I want to publish the moebius-web folder to GitHub, minus the large files so maybe minus the models/ folder , such that when I turn on GitHub Pages for that repo navigating to https://simonw.github.io/moebius-web/ serves the UI Telling it the final URL was important in case it needed to fix the URLs in the demos that it was building so they would work when deployed to production. After a few more rounds of iteration, in between working on my main project, we got to a working, deployed version Except... each time I reloaded the page it seemed to download ~1.3GB of model weights. Browser caching seemed pretty important for this anything clever we can do with serviceworkers or similar to help cache this stuff? It seems to reload every time, I am concerned that there might be something weird about the way HF redirects work that mean we don't benefit from browser caching I knew that Transformers.js projects could handle this properly, so I grabbed a copy of the Whisper Web https://huggingface.co/spaces/Xenova/whisper-web demo, dropped it into /tmp/Moebius/whisper-web and said: look in /tmp/Moebius/whisper-web with a subagent and see how they do this That project was entirely obfuscated, built JavaScript files so I figured using a subagent would avoid spending the rest of my top-level token context deciphering those files. Claude figured out that it was using caches.open "transformers-cache" - the CacheStorage API https://developer.mozilla.org/en-US/docs/Web/API/CacheStorage/open - and added that to our project https://github.com/simonw/moebius-web/commit/05c1cbc4894460a70a8bc1718ac6d152219e0f28 diff-fb89c342dfa36f544a2d16a885b0f3d1d49f436a7d0eaeb80505f80a1f922603 . I've shared the full Claude Code transcript https://gisthost.github.io/?58039ba5c1ca3ed177e8659168996ee4 for this project published using my claude-code-transcripts https://github.com/simonw/claude-code-transcripts tool . This definitely counts as vibe coding: I didn't look at a single line of code from the project, restricting my input to testing, suggesting small feature improvements like a progress bar for the large file downloads and pointing the model in the direction of examples of how I wanted things to work. Since I didn't write any code the amount I learned about the underlying technologies - WebGPU, ONNX, and the Moebius model itself - was very limited. As is usually the case with this kind of project the most important things I learned concerned what was possible : I felt like I should probably try and learn a little more about my project. I fired up Claude.ai https://claude.ai/ and prompted: Clone https://github.com/simonw/moebius-web/ and use it to teach me all about the model and ONNX and the process of converting a model to ONNX and WebGPU and basically everything I'd need to know in order to fully understand this repo Here's the transcript https://claude.ai/share/d11b8f2b-a52d-4ca2-be75-a710eaf18572 and the understanding.md https://github.com/simonw/moebius-web/blob/main/understanding.md Markdown file it created, which I've now added to the GitHub repo. I found the explanation of ONNX particularly enlightening: ONNX Open Neural Network Exchange is a portable, framework-neutral file format for neural networks. An .onnx file is essentially two things bundled together: A computation graph— a directed graph ofnodes, where each node is anoperator Conv , MatMul , Add , Einsum , Softmax , Gather , Resize , … wired together by named tensors flowing between them. This is the "recipe" for the forward pass.The weights— the learned parameter tensors the convolution kernels, the embedding table, etc. , stored as initializers in that same graph.Crucially, ONNX describes what to compute, abstractly, without sayinghoworon what hardware. The operator set is versioned by anopsetnumber this repo usesopset 18 , which pins down exactly which operators exist and what their semantics are. It turns out PyTorch has built in mechanisms for exporting to ONNX, as seen here in export onnx.py https://github.com/simonw/moebius-web/blob/080be6e737ec976130e260d34707d7d9b7f63d5b/python/export onnx.py L91 : torch.onnx.export dec, lat, , dec path, opset version=args.opset, input names= "latent" , output names= "image" , dynamic axes={"latent": {0: "B"}, "image": {0: "B"}}, Claude also included a handy glossary https://github.com/simonw/moebius-web/blob/main/understanding.md 12-mini-glossary and an only-slightly-broken ASCII-art diagram https://github.com/simonw/moebius-web/blob/main/understanding.md 10-putting-the-whole-pipeline-in-one-picture showing how the model pipeline fits together. Tags: browsers https://simonwillison.net/tags/browsers , transformers-js https://simonwillison.net/tags/transformers-js , webgl https://simonwillison.net/tags/webgl , vibe-coding https://simonwillison.net/tags/vibe-coding , coding-agents https://simonwillison.net/tags/coding-agents , claude-code https://simonwillison.net/tags/claude-code , onnx https://simonwillison.net/tags/onnx