Image Optimization vs Alt Text: What AI Agents Actually Read on Your Page

AI agents like Claude and ChatGPT cannot see images, only read alt text, making text-level image optimization more critical than byte-level optimization for agent-driven traffic. With ~50% of images having empty or sub-10-character alt text, sites risk silent retrieval failures on every agent query. Developers should prioritize descriptive alt text and structured metadata while maintaining byte optimization for human visitors.

The Decision Half the web's bytes are images Source 2 source-2 , but the agents now hitting your pages — Claude, ChatGPT, agentic shoppers, coding assistants — consume tokens, not pixels Source 9 source-9 . The choice between optimizing image bytes and optimizing image text is no longer about accessibility versus performance; it's about who your traffic actually is. The Table | Dimension | A: Byte-level optimization next/image , WebP/AVIF, CDN loaders | B: Text-level optimization alt text, captions, structured metadata | |---|---|---| | Latency | Cuts LCP — next/image auto-serves WebP, lazy-loads, sets width/height to prevent CLS | Zero render impact; agents read HTML, not pixels | | Memory | sharp on glibc Linux can balloon without tuning | alt next start ; cloud loaders Cloudinary, Imgix, Akamai for static export Source 7 source-7 Source 17 source-17 ai image alt text module Source 5 source-5 dangerouslyAllowSVG is blocked Source 4 source-4 ; v16 caps qualities to 75 by default Source 18 source-18 Source 10 source-10 ; 8.5% end in .jpg / .png filenames Source 5 source-5 I'd pick B as the default in 2026, and bolt A on top. Agents are the fastest-growing consumer of your HTML Source 11 source-11 , and they cannot see your AVIF. The Mechanism Why A byte-level wins when humans on bad networks dominate. The next/image component serves device-correct WebP, prevents layout shift via intrinsic width/height, and lazy-loads off-screen images natively Source 3 source-3 . On a flaky link, this matters: Kornel's observation that mobile bandwidth arrives in "laggy bursts rather than slowly" Source 20 source-20 means a 155 kB hero is a real LCP hit. Byte savings compound — Lara Hogan's point that images are "arguably the easiest big win" for page load time Source 2 source-2 still holds, and the v16 default of minimumCacheTTL: 14400 4 hours, up from 60 s reflects that revalidation cost was real money Source 18 source-18 . Why B text-level wins when AI agents are reading your site. LLMs are next-token predictors over text Source 15 source-15 . Even multimodal models tokenize images through a vision encoder + projector into the same latent space as text Source 1 source-1 Source 1 source-1 — and IBM's own teams admit "text-ify everything" loses visual context Source 12 source-12 , which is why hybrid multimodal RAG keeps text captions as the retrieval index even when the LLM can see the image Source 12 source-12 . Translation: when an agent or RAG pipeline crawls your page, the alt attribute is the image as far as retrieval is concerned. Docling's whole pitch for AI ingestion is converting unstructured assets into "clean, structured text that large language models can actually use" Source 13 source-13 Source 14 source-14 . The Web Almanac is blunt that ~50% of images ship with empty or sub-10-character alt text Source 10 source-10 — that's a silent retrieval failure on every agent-driven query. Pick B as the default. The Migration Path If you optimized for bytes and now need agents to actually understand your pages: Audit alt coverage. Grep your codebase for <Image and <img and flag any whose alt is empty, missing, or ends in .jpg / .png — the 8.5% filename-as-alt anti-pattern Source 5 source-5 . Replace filename alts with descriptive text. Target 20–30 characters, the band the Almanac flags as balancing brevity and signal Source 5 source-5 . For decorative-only images, alt="" is correct — don't pad. Co-locate machine-readable context. Add opengraph-image.tsx per route for agent crawlers that follow OG metadata Source 16 source-16 Source 19 source-19 , and emit a figcaption near content images so RAG chunking captures the caption with the surrounding paragraph Source 13 source-13 . Keep byte optimization, tighten its config. Stay on next/image with remotePatterns locked down Source 6 source-6 . If you're on Next 16, explicitly set qualities and imageSizes if you need more than the new 75 default or the dropped 16w size Source 18 source-18 . For SVG, use it. SVG carries semantic structure agents can parse Source 10 source-10 , unlike raster — but if you serve user-uploaded SVG through next/image , you must set dangerouslyAllowSVG with a strict CSP and contentDispositionType: 'attachment' Source 4 source-4 . For RAG-targeted content, consider Docling. Convert PDFs/decks to structured Markdown so the text representation of every embedded image survives ingestion Source 14 source-14 . CEMENT Brick If you ship a page tuned only for byte-level image optimization in 2026, then your fastest-growing class of visitors — AI agents and RAG crawlers — will retrieve a blank where your image was, because every LLM-backed reader still resolves images through their textual representation alt, caption, surrounding chunk before any vision encoder is consulted Source 1 source-1 Source 12 source-12 Source 12 source-12 , and a missing or filename-shaped alt collapses to zero signal in the embedding space Source 5 source-5 . Sources What Are Vision Language Models? How AI Sees & Understands Images https://www.youtube.com/watch?v=lOD EE96jhM Optimizing Images | Designing for Performance https://designingforperformance.com/optimizing-images/ mentor-other-image-creators Image Optimization https://nextjs.org/docs/app/getting-started/images Image Legacy https://nextjs.org/docs/pages/api-reference/components/image-legacy - Engineering Docs Image https://nextjs.org/docs/pages/api-reference/components/image How to create a static export of your Next.js application https://nextjs.org/docs/app/guides/static-exports How to self-host your Next.js application https://nextjs.org/docs/app/guides/self-hosting - Engineering Docs - Engineering Docs AI agents in 2025: Why agentic commerce isn't ready for Black Friday yet https://www.youtube.com/watch?v=SdNRWJ-oqjY What is Multimodal RAG? Unlocking LLMs with Vector Databases https://www.youtube.com/watch?v=anLahYrEFiQ Unlock Better RAG & AI Agents with Docling https://www.youtube.com/watch?v=rrQHnibpXX8 What Is Docling? Transforming Unstructured Data for RAG and AI https://www.youtube.com/watch?v=zSA7ylHP6AY AI vs Human Thinking: How Large Language Models Really Work https://www.youtube.com/watch?v=-ovM0daP6bw Metadata and OG images https://nextjs.org/docs/app/getting-started/metadata-and-og-images images https://nextjs.org/docs/pages/api-reference/config/next-config-js/images How to upgrade to version 16 https://nextjs.org/docs/app/guides/upgrading/version-16 opengraph-image and twitter-image https://nextjs.org/docs/app/api-reference/file-conventions/metadata/opengraph-image The present and potential future of progressive image rendering https://jakearchibald.com/2025/present-and-future-of-progressive-image-rendering/