{"slug": "i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs", "title": "I built a $0.0005 screenshot cropper that saves AI agents 95% on vision LLM costs", "summary": "A developer built a stateless pay-per-use API that crops browser screenshots to specific elements, reducing vision LLM costs by 95%. The API uses the x402 payment protocol, charging $0.0005 USDC per crop on Base L2, and eliminates the need for API keys or subscriptions.", "body_md": "If you're building AI agents that work with browser screenshots, you already know the pain.\n\nYou take a full 1920×1080 screenshot, pass it to GPT-4o or Claude, and watch your token bill climb — while the model downscales the image anyway and blurs the exact text you needed it to read.\n\nThere's a better way.\n\nVision LLMs are expensive for two reasons when you feed them full screenshots:\n\nBut your agent already knows *where* to look. Browser automation tools like Playwright and Puppeteer give you `getBoundingClientRect()`\n\n— the exact pixel coordinates of any element on screen.\n\nSo why are you sending the whole screenshot?\n\nI built a stateless pay-per-use API that takes a screenshot and pixel coordinates, and returns just the cropped element as a lossless PNG — ready to pass directly to your vision LLM.\n\n```\nPOST /crop\n{\n  \"image\":  \"<base64 screenshot>\",\n  \"x\":      120,\n  \"y\":      45,\n  \"width\":  640,\n  \"height\": 80\n}\n```\n\nReturns:\n\n```\n{\n  \"success\": true,\n  \"data\": {\n    \"base64\": \"iVBORw0KGgo...\",\n    \"mime\":   \"image/png\",\n    \"width\":  640,\n    \"height\": 80,\n    \"bytes\":  4821\n  }\n}\n```\n\nA 4KB crop instead of a 2MB screenshot. Same information. 95% fewer tokens.\n\nHere's where it gets interesting. The API uses the **x402 payment protocol** — HTTP's long-dormant 402 Payment Required status code, finally put to use.\n\nThere are no API keys. No accounts. No subscriptions. The agent pays $0.0005 USDC per crop on Base L2 automatically.\n\nThe flow:\n\n```\n1. Agent POSTs to /crop (no payment header)\n   ← 402 with payment instructions in headers\n\n2. Agent transfers 0.0005 USDC to recipient wallet on Base\n   (near-zero gas, ~2 second settlement)\n\n3. Agent POSTs again with x-payment-tx-hash header\n   ← 200 with cropped PNG\n```\n\nThe entire exchange happens inside the HTTP request cycle. No human intervention. No billing dashboard. The money lands directly in the operator's wallet on-chain.\n\nHere's what using it looks like in a Playwright agent:\n\n``` js\nimport { chromium } from 'playwright';\nimport { readFileSync } from 'fs';\n\nconst browser = await chromium.launch();\nconst page    = await browser.newPage();\nawait page.goto('https://example.com/dashboard');\n\n// Take screenshot\nawait page.screenshot({ path: 'screen.png' });\nconst imageB64 = readFileSync('screen.png').toString('base64');\n\n// Get element coordinates\nconst rect = await page.$eval('.price-display', el => el.getBoundingClientRect().toJSON());\n\n// Probe the API for payment instructions\nconst probe = await fetch('https://x402-vision-cropper.onrender.com/crop', {\n  method:  'POST',\n  headers: { 'Content-Type': 'application/json' },\n  body:    JSON.stringify({\n    image:  imageB64,\n    x:      Math.floor(rect.x),\n    y:      Math.floor(rect.y),\n    width:  Math.floor(rect.width),\n    height: Math.floor(rect.height),\n  }),\n});\n\n// → 402 response with payment details in headers\nconst recipient = probe.headers.get('x-payment-recipient');\nconst amount    = probe.headers.get('x-payment-price-usdc');\n\n// Pay on Base L2 using viem\nconst txHash = await sendUsdc({ recipient, amount }); // your wallet logic here\n\n// Resubmit with payment proof\nconst result = await fetch('https://x402-vision-cropper.onrender.com/crop', {\n  method:  'POST',\n  headers: {\n    'Content-Type':       'application/json',\n    'x-payment-tx-hash':  txHash,\n  },\n  body: JSON.stringify({\n    image:  imageB64,\n    x:      Math.floor(rect.x),\n    y:      Math.floor(rect.y),\n    width:  Math.floor(rect.width),\n    height: Math.floor(rect.height),\n  }),\n});\n\nconst { data } = await result.json();\n\n// Pass the tiny crop to your vision LLM instead of the full screenshot\nconst response = await openai.chat.completions.create({\n  model: 'gpt-4o',\n  messages: [{\n    role: 'user',\n    content: [\n      { type: 'image_url', image_url: { url: `data:${data.mime};base64,${data.base64}` } },\n      { type: 'text', text: 'What is the price shown?' }\n    ]\n  }]\n});\n```\n\nThe server is intentionally minimal:\n\nThe entire codebase is about 400 lines across 7 files. No database. No session state. No auth layer beyond the payment itself.\n\nThe API is live now:\n\n```\n# Check it's running\ncurl https://x402-vision-cropper.onrender.com/health\n\n# Trigger the payment challenge\ncurl -X POST https://x402-vision-cropper.onrender.com/crop \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"image\":\"'\"$(python3 -c \"print('A'*200)\")\"'\",\"x\":0,\"y\":0,\"width\":10,\"height\":10}'\n```\n\nMachine-readable docs for agents: [https://x402-vision-cropper.onrender.com/llms.txt](https://x402-vision-cropper.onrender.com/llms.txt)\n\n**x402 is genuinely exciting but very early.** The protocol works cleanly — payment instructions in headers, proof in the retry, settlement on-chain. But the agent ecosystem is still catching up. Most frameworks don't have native wallet support yet.\n\n**Stateless by design is underrated.** No database means no breach, no GDPR headache, no backup strategy, no connection pooling. Every request lives and dies in RAM. For a high-throughput API that processes sensitive screenshot data this is the right architecture.\n\n**The unit economics make sense at scale.** At $0.0005 per crop the service costs less than a rounding error compared to what it saves on vision tokens. The challenge isn't pricing — it's volume.\n\nIf you're building browser agents or anything that feeds screenshots to vision models, give it a try. And if you're building in the x402 / agentic payments space I'd love to hear what you're working on.", "url": "https://wpnews.pro/news/i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs", "canonical_source": "https://dev.to/aaroncarlisle94/i-built-a-00005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs-2c41", "published_at": "2026-06-24 21:20:23+00:00", "updated_at": "2026-06-24 21:43:02.576041+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools"], "entities": ["Playwright", "Puppeteer", "GPT-4o", "Claude", "Base L2", "USDC", "OpenAI", "x402"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs", "markdown": "https://wpnews.pro/news/i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs.md", "text": "https://wpnews.pro/news/i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs.txt", "jsonld": "https://wpnews.pro/news/i-built-a-0-0005-screenshot-cropper-that-saves-ai-agents-95-on-vision-llm-costs.jsonld"}}