I Open-Sourced a Browser-Based AI Background Remover — Here's the Full Architecture

An open-source, browser-based AI background removal tool that processes images entirely on the client side, ensuring user privacy. The system uses the `@imgly/background-removal` library with an ONNX model and WebAssembly inference, running through five stages from image upload to transparent PNG download. A key feature is the editable grayscale mask, which allows users to manually correct AI imperfections on hair edges or transparent objects.

Most background removal tools work like this: upload your photo to a server, wait for an AI model to process it, download the result. Your image sits on someone else's infrastructure. You hope they delete it. I built one that works differently. The AI model runs in your browser tab . Your image never leaves your device. And I just open-sourced the core logic https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone — two files, zero dependencies beyond a CDN import. Here's how it works under the hood. The Pipeline The full flow from "user drops an image" to "transparent PNG download" goes through five stages: Upload → ONNX Model Load → WebAssembly Inference → Mask Generation → Canvas Compositing Each stage runs entirely client-side. Let me walk through them. Stage 1: Loading the AI Model in the Browser The backbone is @imgly/background-removal https://img.ly/blog/background-removal-js/ , an open-source library that bundles an ONNX segmentation model with ONNX Runtime Web WebAssembly backend . js const LIB CDN = 'https://cdn.jsdelivr.net/npm/@imgly/background-removal@1.5.5'; async function loadLibrary { const module = await import LIB CDN + '/+esm' ; removeBackgroundFn = module.removeBackground; } The first call downloads ~40MB of model weights. That sounds heavy, but: - The browser caches it automatically - Subsequent uses load instantly from cache - No server round-trip on any future use This is the same trade-off FFmpeg.wasm makes — big initial download, but then your browser becomes a local processing powerhouse. Stage 2: Running AI Inference Locally Once the model is loaded, inference is straightforward: js const imageBlob = await new Promise r = canvas.toBlob r, 'image/png' ; const resultBlob = await removeBackgroundFn imageBlob, { model: 'medium', output: { format: 'image/png' }, progress: key, current, total = { // Update loading UI } } ; What's happening behind the scenes: - The library resizes your image to the model's input dimensions - Pixel data is converted to a tensor - ONNX Runtime Web runs the segmentation model via WebAssembly - The output tensor a per-pixel foreground probability map is converted back to an image with transparent background The medium model balances quality and speed. On a decent laptop, inference takes 2-5 seconds for a typical photo. On a phone, maybe 8-15 seconds. Acceptable for a free, private tool. Stage 3: Building the Editable Mask Here's where it gets interesting. The AI output isn't final — it's a starting point. I extract the alpha channel from the AI result and build an editable grayscale mask: js async function buildMaskFromResult { const w = originalImage.naturalWidth; const h = originalImage.naturalHeight; // Draw AI result to a temporary canvas const resultCanvas = document.createElement 'canvas' ; resultCanvas.width = w; resultCanvas.height = h; const rCtx = resultCanvas.getContext '2d' ; rCtx.drawImage resultImg, 0, 0 ; const resultData = rCtx.getImageData 0, 0, w, h ; // Extract alpha channel → grayscale mask // White = foreground keep , Black = background remove maskCanvas = document.createElement 'canvas' ; maskCanvas.width = w; maskCanvas.height = h; maskCtx = maskCanvas.getContext '2d' ; const maskData = maskCtx.createImageData w, h ; for let i = 0; i < resultData.data.length; i += 4 { const alpha = resultData.data i + 3 ; maskData.data i = alpha; // R maskData.data i + 1 = alpha; // G maskData.data i + 2 = alpha; // B maskData.data i + 3 = 255; // A mask itself is always opaque } maskCtx.putImageData maskData, 0, 0 ; } Why a separate mask canvas? Because users need to fix the AI's mistakes. Hair edges, transparent objects, similar-colored backgrounds — no AI gets these perfect 100% of the time. The mask canvas becomes a paintable surface. Stage 4: Manual Refinement with Brush & Eraser This is the feature that separates a toy demo from a usable tool. Users can: - Brush paint white on mask → restore foreground areas the AI removed - Eraser paint black on mask → remove background areas the AI missed js function paintOnMask e { const rect = editCanvas.getBoundingClientRect ; const x = e.clientX - rect.left / rect.width maskCanvas.width; const y = e.clientY - rect.top / rect.height maskCanvas.height; const brushSize = parseInt brushSizeEl.value ; const softness = parseInt brushSoftEl.value / 100; maskCtx.lineCap = 'round'; maskCtx.lineWidth = brushSize; // Softness = CSS filter blur on the mask canvas context if softness 0 { maskCtx.filter = blur ${Math.round brushSize softness 0.3 }px ; } if currentTool === 'brush' { maskCtx.globalCompositeOperation = 'lighter'; maskCtx.strokeStyle = ' ffffff'; } else { maskCtx.globalCompositeOperation = 'source-over'; maskCtx.strokeStyle = ' 000000'; } maskCtx.beginPath ; maskCtx.moveTo lastX, lastY ; maskCtx.lineTo x, y ; maskCtx.stroke ; } Key details: - Coordinate mapping : The edit canvas is CSS-scaled to fit the viewport, but the mask operates at full image resolution. Every mouse position gets mapped from display coordinates to mask coordinates. - Edge softness : Uses Canvas 2D filter: blur on the stroke — this creates feathered edges instead of hard cuts. - Undo stack : Each mousedown saves a full ImageData snapshot of the mask. Up to 20 undo levels. The brush cursor is a position: fixed div that follows the mouse, sized to match the display-scaled brush diameter. The actual canvas cursor is set to none . Stage 5: Compositing the Final Output To generate the downloadable PNG, the mask is applied to the original image: js function applyMaskToOriginal { const origData = origCtx.getImageData 0, 0, w, h ; const mData = maskCtx.getImageData 0, 0, w, h ; const outData = oCtx.createImageData w, h ; for let i = 0; i < origData.data.length; i += 4 { outData.data i = origData.data i ; // R — original outData.data i + 1 = origData.data i + 1 ; // G — original outData.data i + 2 = origData.data i + 2 ; // B — original outData.data i + 3 = mData.data i ; // A — from mask R channel } oCtx.putImageData outData, 0, 0 ; return outCanvas; } The mask's R channel which equals G and B since it's grayscale becomes the alpha channel of the output. White mask pixels → fully opaque. Black → fully transparent. Gray → semi-transparent useful for hair and soft edges . The Refine Mode Overlay In refine mode, users see the original image with a semi-transparent red overlay on removed areas: function renderMaskOverlay { editCtx.drawImage maskCanvas, 0, 0, dw, dh ; const overlayData = editCtx.getImageData 0, 0, dw, dh ; for let i = 0; i < overlayData.data.length; i += 4 { const maskVal = overlayData.data i ; if maskVal < 128 { // Removed area → semi-transparent red overlayData.data i = 220; // R overlayData.data i + 1 = 50; // G overlayData.data i + 2 = 50; // B overlayData.data i + 3 = 120; // A } else { // Kept area → fully transparent show original underneath overlayData.data i + 3 = 0; } } editCtx.putImageData overlayData, 0, 0 ; } This gives immediate visual feedback — you can see exactly what the AI removed and paint corrections in real time. Performance Considerations - Memory : Three full-resolution canvases live in memory original, mask, output . For a 4000×3000 photo, that's ~144MB of pixel data. Mobile devices with <4GB RAM may struggle. - Real-time rendering : Every brush stroke triggers renderPreview via requestAnimationFrame . This redraws the preview canvas + overlay from the mask. On large images, there's a noticeable lag. - Touch support : Full touch event handling with passive: false to prevent scroll interference. What I Stripped for the Open-Source Version The production version on ToolKnit https://toolknit.com/tools/background-remover.html includes: - Daily usage limits fair-use throttling - Analytics tracking - Self-hosted model weights faster loading from our CDN - Sound effects on completion - Site navigation and SEO shell The open-source version https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone strips all of that down to two files: - index.html — standalone UI ~250 lines - app.js — core logic ~380 lines You can clone it, run npx serve . , and have a working background remover in 30 seconds. What's Next Some ideas for anyone who wants to fork and extend: - Background replacement — solid color or custom image behind the subject - Batch processing — drop multiple images, process all sequentially - WebGPU acceleration — ONNX Runtime Web supports WebGPU; inference could be 3-5x faster - Edge feathering controls — post-process the mask with adjustable blur radius - Before/after slider — drag to compare original and result Try It - Live tool : toolknit.com/tools/background-remover.html https://toolknit.com/tools/background-remover.html - Open source : github.com/2645149786-dotcom/toolknit https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone - All 61 tools : toolknit.com https://toolknit.com If you've ever needed to remove a background without uploading your photo to a random website — this is it. Clone it, use it, break it, improve it. Built by Zihang Dong. Building browser-first tools at ToolKnit.