I built a $0.0005 screenshot cropper that saves AI agents 95% on vision LLM costs

A developer built a stateless pay-per-use API that crops browser screenshots to specific elements, reducing vision LLM costs by 95%. The API uses the x402 payment protocol, charging $0.0005 USDC per crop on Base L2, and eliminates the need for API keys or subscriptions.

If you're building AI agents that work with browser screenshots, you already know the pain. You take a full 1920×1080 screenshot, pass it to GPT-4o or Claude, and watch your token bill climb — while the model downscales the image anyway and blurs the exact text you needed it to read. There's a better way. Vision LLMs are expensive for two reasons when you feed them full screenshots: But your agent already knows where to look. Browser automation tools like Playwright and Puppeteer give you getBoundingClientRect — the exact pixel coordinates of any element on screen. So why are you sending the whole screenshot? I built a stateless pay-per-use API that takes a screenshot and pixel coordinates, and returns just the cropped element as a lossless PNG — ready to pass directly to your vision LLM. POST /crop { "image": "<base64 screenshot ", "x": 120, "y": 45, "width": 640, "height": 80 } Returns: { "success": true, "data": { "base64": "iVBORw0KGgo...", "mime": "image/png", "width": 640, "height": 80, "bytes": 4821 } } A 4KB crop instead of a 2MB screenshot. Same information. 95% fewer tokens. Here's where it gets interesting. The API uses the x402 payment protocol — HTTP's long-dormant 402 Payment Required status code, finally put to use. There are no API keys. No accounts. No subscriptions. The agent pays $0.0005 USDC per crop on Base L2 automatically. The flow: 1. Agent POSTs to /crop no payment header ← 402 with payment instructions in headers 2. Agent transfers 0.0005 USDC to recipient wallet on Base near-zero gas, ~2 second settlement 3. Agent POSTs again with x-payment-tx-hash header ← 200 with cropped PNG The entire exchange happens inside the HTTP request cycle. No human intervention. No billing dashboard. The money lands directly in the operator's wallet on-chain. Here's what using it looks like in a Playwright agent: js import { chromium } from 'playwright'; import { readFileSync } from 'fs'; const browser = await chromium.launch ; const page = await browser.newPage ; await page.goto 'https://example.com/dashboard' ; // Take screenshot await page.screenshot { path: 'screen.png' } ; const imageB64 = readFileSync 'screen.png' .toString 'base64' ; // Get element coordinates const rect = await page.$eval '.price-display', el = el.getBoundingClientRect .toJSON ; // Probe the API for payment instructions const probe = await fetch 'https://x402-vision-cropper.onrender.com/crop', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify { image: imageB64, x: Math.floor rect.x , y: Math.floor rect.y , width: Math.floor rect.width , height: Math.floor rect.height , } , } ; // → 402 response with payment details in headers const recipient = probe.headers.get 'x-payment-recipient' ; const amount = probe.headers.get 'x-payment-price-usdc' ; // Pay on Base L2 using viem const txHash = await sendUsdc { recipient, amount } ; // your wallet logic here // Resubmit with payment proof const result = await fetch 'https://x402-vision-cropper.onrender.com/crop', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-payment-tx-hash': txHash, }, body: JSON.stringify { image: imageB64, x: Math.floor rect.x , y: Math.floor rect.y , width: Math.floor rect.width , height: Math.floor rect.height , } , } ; const { data } = await result.json ; // Pass the tiny crop to your vision LLM instead of the full screenshot const response = await openai.chat.completions.create { model: 'gpt-4o', messages: { role: 'user', content: { type: 'image url', image url: { url: data:${data.mime};base64,${data.base64} } }, { type: 'text', text: 'What is the price shown?' } } } ; The server is intentionally minimal: The entire codebase is about 400 lines across 7 files. No database. No session state. No auth layer beyond the payment itself. The API is live now: Check it's running curl https://x402-vision-cropper.onrender.com/health Trigger the payment challenge curl -X POST https://x402-vision-cropper.onrender.com/crop \ -H "Content-Type: application/json" \ -d '{"image":"'"$ python3 -c "print 'A' 200 " "'","x":0,"y":0,"width":10,"height":10}' Machine-readable docs for agents: https://x402-vision-cropper.onrender.com/llms.txt https://x402-vision-cropper.onrender.com/llms.txt x402 is genuinely exciting but very early. The protocol works cleanly — payment instructions in headers, proof in the retry, settlement on-chain. But the agent ecosystem is still catching up. Most frameworks don't have native wallet support yet. Stateless by design is underrated. No database means no breach, no GDPR headache, no backup strategy, no connection pooling. Every request lives and dies in RAM. For a high-throughput API that processes sensitive screenshot data this is the right architecture. The unit economics make sense at scale. At $0.0005 per crop the service costs less than a rounding error compared to what it saves on vision tokens. The challenge isn't pricing — it's volume. If you're building browser agents or anything that feeds screenshots to vision models, give it a try. And if you're building in the x402 / agentic payments space I'd love to hear what you're working on.