# I Open-Sourced a Browser-Based AI Background Remover — Here's the Full Architecture

> Source: <https://dev.to/dngzihng114379/i-open-sourced-a-browser-based-ai-background-remover-heres-the-full-architecture-1olb>
> Published: 2026-05-20 03:30:59+00:00

Most background removal tools work like this: upload your photo to a server, wait for an AI model to process it, download the result. Your image sits on someone else's infrastructure. You hope they delete it.

I built one that works differently. The AI model runs **in your browser tab**. Your image never leaves your device. And I just [open-sourced the core logic](https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone) — two files, zero dependencies beyond a CDN import.

Here's how it works under the hood.

## The Pipeline

The full flow from "user drops an image" to "transparent PNG download" goes through five stages:

```
Upload → ONNX Model Load → WebAssembly Inference → Mask Generation → Canvas Compositing
```

Each stage runs entirely client-side. Let me walk through them.

## Stage 1: Loading the AI Model in the Browser

The backbone is [ @imgly/background-removal](https://img.ly/blog/background-removal-js/), an open-source library that bundles an ONNX segmentation model with ONNX Runtime Web (WebAssembly backend).

``` js
const LIB_CDN = 'https://cdn.jsdelivr.net/npm/@imgly/background-removal@1.5.5';

async function loadLibrary() {
  const module = await import(LIB_CDN + '/+esm');
  removeBackgroundFn = module.removeBackground;
}
```

The first call downloads ~40MB of model weights. That sounds heavy, but:

- The browser caches it automatically
- Subsequent uses load instantly from cache
- No server round-trip on any future use

This is the same trade-off FFmpeg.wasm makes — big initial download, but then your browser becomes a local processing powerhouse.

## Stage 2: Running AI Inference Locally

Once the model is loaded, inference is straightforward:

``` js
const imageBlob = await new Promise(r => canvas.toBlob(r, 'image/png'));

const resultBlob = await removeBackgroundFn(imageBlob, {
  model: 'medium',
  output: { format: 'image/png' },
  progress: (key, current, total) => {
    // Update loading UI
  }
});
```

What's happening behind the scenes:

- The library resizes your image to the model's input dimensions
- Pixel data is converted to a tensor
- ONNX Runtime Web runs the segmentation model via WebAssembly
- The output tensor (a per-pixel foreground probability map) is converted back to an image with transparent background

The `medium`

model balances quality and speed. On a decent laptop, inference takes 2-5 seconds for a typical photo. On a phone, maybe 8-15 seconds. Acceptable for a free, private tool.

## Stage 3: Building the Editable Mask

Here's where it gets interesting. The AI output isn't final — it's a starting point. I extract the alpha channel from the AI result and build an editable grayscale mask:

``` js
async function buildMaskFromResult() {
  const w = originalImage.naturalWidth;
  const h = originalImage.naturalHeight;

  // Draw AI result to a temporary canvas
  const resultCanvas = document.createElement('canvas');
  resultCanvas.width = w;
  resultCanvas.height = h;
  const rCtx = resultCanvas.getContext('2d');
  rCtx.drawImage(resultImg, 0, 0);
  const resultData = rCtx.getImageData(0, 0, w, h);

  // Extract alpha channel → grayscale mask
  // White = foreground (keep), Black = background (remove)
  maskCanvas = document.createElement('canvas');
  maskCanvas.width = w;
  maskCanvas.height = h;
  maskCtx = maskCanvas.getContext('2d');
  const maskData = maskCtx.createImageData(w, h);

  for (let i = 0; i < resultData.data.length; i += 4) {
    const alpha = resultData.data[i + 3];
    maskData.data[i] = alpha;     // R
    maskData.data[i + 1] = alpha; // G
    maskData.data[i + 2] = alpha; // B
    maskData.data[i + 3] = 255;   // A (mask itself is always opaque)
  }
  maskCtx.putImageData(maskData, 0, 0);
}
```

**Why a separate mask canvas?**

Because users need to fix the AI's mistakes. Hair edges, transparent objects, similar-colored backgrounds — no AI gets these perfect 100% of the time. The mask canvas becomes a paintable surface.

## Stage 4: Manual Refinement with Brush & Eraser

This is the feature that separates a toy demo from a usable tool. Users can:

-
**Brush**(paint white on mask) → restore foreground areas the AI removed -
**Eraser**(paint black on mask) → remove background areas the AI missed

``` js
function paintOnMask(e) {
  const rect = editCanvas.getBoundingClientRect();
  const x = (e.clientX - rect.left) / rect.width * maskCanvas.width;
  const y = (e.clientY - rect.top) / rect.height * maskCanvas.height;

  const brushSize = parseInt(brushSizeEl.value);
  const softness = parseInt(brushSoftEl.value) / 100;

  maskCtx.lineCap = 'round';
  maskCtx.lineWidth = brushSize;

  // Softness = CSS filter blur on the mask canvas context
  if (softness > 0) {
    maskCtx.filter = `blur(${Math.round(brushSize * softness * 0.3)}px)`;
  }

  if (currentTool === 'brush') {
    maskCtx.globalCompositeOperation = 'lighter';
    maskCtx.strokeStyle = '#ffffff';
  } else {
    maskCtx.globalCompositeOperation = 'source-over';
    maskCtx.strokeStyle = '#000000';
  }

  maskCtx.beginPath();
  maskCtx.moveTo(lastX, lastY);
  maskCtx.lineTo(x, y);
  maskCtx.stroke();
}
```

**Key details:**

-
**Coordinate mapping**: The edit canvas is CSS-scaled to fit the viewport, but the mask operates at full image resolution. Every mouse position gets mapped from display coordinates to mask coordinates. -
**Edge softness**: Uses Canvas 2D`filter: blur()`

on the stroke — this creates feathered edges instead of hard cuts. -
**Undo stack**: Each mousedown saves a full`ImageData`

snapshot of the mask. Up to 20 undo levels.

The brush cursor is a `position: fixed`

div that follows the mouse, sized to match the display-scaled brush diameter. The actual canvas cursor is set to `none`

.

## Stage 5: Compositing the Final Output

To generate the downloadable PNG, the mask is applied to the original image:

``` js
function applyMaskToOriginal() {
  const origData = origCtx.getImageData(0, 0, w, h);
  const mData = maskCtx.getImageData(0, 0, w, h);
  const outData = oCtx.createImageData(w, h);

  for (let i = 0; i < origData.data.length; i += 4) {
    outData.data[i] = origData.data[i];       // R — original
    outData.data[i + 1] = origData.data[i + 1]; // G — original
    outData.data[i + 2] = origData.data[i + 2]; // B — original
    outData.data[i + 3] = mData.data[i];       // A — from mask R channel
  }

  oCtx.putImageData(outData, 0, 0);
  return outCanvas;
}
```

The mask's R channel (which equals G and B since it's grayscale) becomes the alpha channel of the output. White mask pixels → fully opaque. Black → fully transparent. Gray → semi-transparent (useful for hair and soft edges).

## The Refine Mode Overlay

In refine mode, users see the original image with a semi-transparent red overlay on removed areas:

```
function renderMaskOverlay() {
  editCtx.drawImage(maskCanvas, 0, 0, dw, dh);
  const overlayData = editCtx.getImageData(0, 0, dw, dh);

  for (let i = 0; i < overlayData.data.length; i += 4) {
    const maskVal = overlayData.data[i];
    if (maskVal < 128) {
      // Removed area → semi-transparent red
      overlayData.data[i] = 220;     // R
      overlayData.data[i + 1] = 50;  // G
      overlayData.data[i + 2] = 50;  // B
      overlayData.data[i + 3] = 120; // A
    } else {
      // Kept area → fully transparent (show original underneath)
      overlayData.data[i + 3] = 0;
    }
  }
  editCtx.putImageData(overlayData, 0, 0);
}
```

This gives immediate visual feedback — you can see exactly what the AI removed and paint corrections in real time.

## Performance Considerations

-
**Memory**: Three full-resolution canvases live in memory (original, mask, output). For a 4000×3000 photo, that's ~144MB of pixel data. Mobile devices with <4GB RAM may struggle. -
**Real-time rendering**: Every brush stroke triggers`renderPreview()`

via`requestAnimationFrame`

. This redraws the preview canvas + overlay from the mask. On large images, there's a noticeable lag. -
**Touch support**: Full touch event handling with`passive: false`

to prevent scroll interference.

## What I Stripped for the Open-Source Version

The production version on [ToolKnit](https://toolknit.com/tools/background-remover.html) includes:

- Daily usage limits (fair-use throttling)
- Analytics tracking
- Self-hosted model weights (faster loading from our CDN)
- Sound effects on completion
- Site navigation and SEO shell

The [open-source version](https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone) strips all of that down to two files:

-
`index.html`

— standalone UI (~250 lines) -
`app.js`

— core logic (~380 lines)

You can clone it, run `npx serve .`

, and have a working background remover in 30 seconds.

## What's Next

Some ideas for anyone who wants to fork and extend:

-
**Background replacement**— solid color or custom image behind the subject -
**Batch processing**— drop multiple images, process all sequentially -
**WebGPU acceleration**— ONNX Runtime Web supports WebGPU; inference could be 3-5x faster -
**Edge feathering controls**— post-process the mask with adjustable blur radius -
**Before/after slider**— drag to compare original and result

## Try It

-
**Live tool**:[toolknit.com/tools/background-remover.html](https://toolknit.com/tools/background-remover.html) -
**Open source**:[github.com/2645149786-dotcom/toolknit](https://github.com/2645149786-dotcom/toolknit/tree/main/open-source/background-remover-standalone) -
**All 61 tools**:[toolknit.com](https://toolknit.com)

If you've ever needed to remove a background without uploading your photo to a random website — this is it. Clone it, use it, break it, improve it.

*Built by Zihang Dong. Building browser-first tools at ToolKnit.*
