I built a $0.0005 screenshot cropper that saves AI agents 95% on vision LLM costs

wpnews.pro

cd /news/artificial-intelligence/i-built-a-0-0005-screenshot-cropper-… · home › topics › artificial-intelligence › article

[ARTICLE · art-38440] src=dev.to ↗ pub=2026-06-24T21:20Z topic=artificial-intelligence verified=true sentiment=↑ positive

I built a $0.0005 screenshot cropper that saves AI agents 95% on vision LLM costs

A developer built a stateless pay-per-use API that crops browser screenshots to specific elements, reducing vision LLM costs by 95%. The API uses the x402 payment protocol, charging $0.0005 USDC per crop on Base L2, and eliminates the need for API keys or subscriptions.

read4 min views5 publishedJun 24, 2026

If you're building AI agents that work with browser screenshots, you already know the pain.

You take a full 1920×1080 screenshot, pass it to GPT-4o or Claude, and watch your token bill climb — while the model downscales the image anyway and blurs the exact text you needed it to read.

There's a better way.

Vision LLMs are expensive for two reasons when you feed them full screenshots:

But your agent already knows where to look. Browser automation tools like Playwright and Puppeteer give you getBoundingClientRect()

— the exact pixel coordinates of any element on screen.

So why are you sending the whole screenshot?

I built a stateless pay-per-use API that takes a screenshot and pixel coordinates, and returns just the cropped element as a lossless PNG — ready to pass directly to your vision LLM.

POST /crop
{
  "image":  "<base64 screenshot>",
  "x":      120,
  "y":      45,
  "width":  640,
  "height": 80
}

Returns:

{
  "success": true,
  "data": {
    "base64": "iVBORw0KGgo...",
    "mime":   "image/png",
    "width":  640,
    "height": 80,
    "bytes":  4821
  }
}

A 4KB crop instead of a 2MB screenshot. Same information. 95% fewer tokens.

Here's where it gets interesting. The API uses the x402 payment protocol — HTTP's long-dormant 402 Payment Required status code, finally put to use.

There are no API keys. No accounts. No subscriptions. The agent pays $0.0005 USDC per crop on Base L2 automatically.

The flow:

1. Agent POSTs to /crop (no payment header)
   ← 402 with payment instructions in headers

2. Agent transfers 0.0005 USDC to recipient wallet on Base
   (near-zero gas, ~2 second settlement)

3. Agent POSTs again with x-payment-tx-hash header
   ← 200 with cropped PNG

The entire exchange happens inside the HTTP request cycle. No human intervention. No billing dashboard. The money lands directly in the operator's wallet on-chain.

Here's what using it looks like in a Playwright agent:

import { chromium } from 'playwright';
import { readFileSync } from 'fs';

const browser = await chromium.launch();
const page    = await browser.newPage();
await page.goto('https://example.com/dashboard');

// Take screenshot
await page.screenshot({ path: 'screen.png' });
const imageB64 = readFileSync('screen.png').toString('base64');

// Get element coordinates
const rect = await page.$eval('.price-display', el => el.getBoundingClientRect().toJSON());

// Probe the API for payment instructions
const probe = await fetch('https://x402-vision-cropper.onrender.com/crop', {
  method:  'POST',
  headers: { 'Content-Type': 'application/json' },
  body:    JSON.stringify({
    image:  imageB64,
    x:      Math.floor(rect.x),
    y:      Math.floor(rect.y),
    width:  Math.floor(rect.width),
    height: Math.floor(rect.height),
  }),
});

// → 402 response with payment details in headers
const recipient = probe.headers.get('x-payment-recipient');
const amount    = probe.headers.get('x-payment-price-usdc');

// Pay on Base L2 using viem
const txHash = await sendUsdc({ recipient, amount }); // your wallet logic here

// Resubmit with payment proof
const result = await fetch('https://x402-vision-cropper.onrender.com/crop', {
  method:  'POST',
  headers: {
    'Content-Type':       'application/json',
    'x-payment-tx-hash':  txHash,
  },
  body: JSON.stringify({
    image:  imageB64,
    x:      Math.floor(rect.x),
    y:      Math.floor(rect.y),
    width:  Math.floor(rect.width),
    height: Math.floor(rect.height),
  }),
});

const { data } = await result.json();

// Pass the tiny crop to your vision LLM instead of the full screenshot
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: [
      { type: 'image_url', image_url: { url: `data:${data.mime};base64,${data.base64}` } },
      { type: 'text', text: 'What is the price shown?' }
    ]
  }]
});

The server is intentionally minimal:

The entire codebase is about 400 lines across 7 files. No database. No session state. No auth layer beyond the payment itself.

The API is live now:

curl https://x402-vision-cropper.onrender.com/health

curl -X POST https://x402-vision-cropper.onrender.com/crop \
  -H "Content-Type: application/json" \
  -d '{"image":"'"$(python3 -c "print('A'*200)")"'","x":0,"y":0,"width":10,"height":10}'

Machine-readable docs for agents: https://x402-vision-cropper.onrender.com/llms.txt

x402 is genuinely exciting but very early. The protocol works cleanly — payment instructions in headers, proof in the retry, settlement on-chain. But the agent ecosystem is still catching up. Most frameworks don't have native wallet support yet.

Stateless by design is underrated. No database means no breach, no GDPR headache, no backup strategy, no connection pooling. Every request lives and dies in RAM. For a high-throughput API that processes sensitive screenshot data this is the right architecture.

The unit economics make sense at scale. At $0.0005 per crop the service costs less than a rounding error compared to what it saves on vision tokens. The challenge isn't pricing — it's volume.

If you're building browser agents or anything that feeds screenshots to vision models, give it a try. And if you're building in the x402 / agentic payments space I'd love to hear what you're working on.

source & further reading

dev.to — original article Next.js 16.3, WebSocket on Vercel, cnfast, Instant Navigations, React Libraries for 2026, What RSCs Can Do Building Hardware-Accelerated FFmpeg on NVIDIA Jetson AGX Orin 64GB Build a RAG System with Claude & ChatGPT APIs

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-a-0-0005-screens…

Read original on dev.to → dev.to/aaroncarlisle94/i-built-a-00005-screensho…

mentioned entities

Playwright

Puppeteer

GPT-4o

Claude

Base L2

USDC

OpenAI

x402

metadata

slugi-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs

topic#artificial-intelligence

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← previOS 27 beta 2: Apple tells Siri …

next →Connect Your AI Agent to Google …

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 25 Jun · #artificial-intelligence

Build a RAG System with Claude & ChatGPT APIs

byteiota.com · 25 Jun · #artificial-intelligence

DESIGN.md Gives AI Agents a Memory for Your Brand

github.com · 25 Jun · #artificial-intelligence

Show HN: Omnigraph - object-storage native graph engine with git-style workflows

trypolygraph.com · 25 Jun · #artificial-intelligence

Show HN: Polygraph – Let AI agents see cross repo and maintain session memory

── more on @playwright 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required