# Image Optimization vs Alt Text: What AI Agents Actually Read on Your Page

> Source: <https://blog.r-lopes.com/posts/2026-06-06-image-optimization-vs-alt-text-what-ai-agents-actually-read>
> Published: 2026-06-06 14:00:00+00:00

## The Decision

Half the web's bytes are images [Source 2](#source-2), but the agents now hitting your pages — Claude, ChatGPT, agentic shoppers, coding assistants — consume tokens, not pixels [Source 9](#source-9). The choice between optimizing image *bytes* and optimizing image *text* is no longer about accessibility versus performance; it's about who your traffic actually is.

## The Table

| Dimension | A: Byte-level optimization (`next/image` , WebP/AVIF, CDN loaders) |
B: Text-level optimization (alt text, captions, structured metadata) |
|---|---|---|
| Latency | Cuts LCP — `next/image` auto-serves WebP, lazy-loads, sets width/height to prevent CLS
|
Zero render impact; agents read HTML, not pixels |
| Memory | sharp on glibc Linux can balloon without tuning
|

`alt`

`next start`

; cloud loaders (Cloudinary, Imgix, Akamai) for static export [Source 7](#source-7)[Source 17](#source-17)`ai_image_alt_text`

module) [Source 5](#source-5)`dangerouslyAllowSVG`

is blocked [Source 4](#source-4); v16 caps`qualities`

to `[75]`

by default [Source 18](#source-18)[Source 10](#source-10); 8.5% end in`.jpg`

/`.png`

filenames [Source 5](#source-5)I'd pick **B** as the default in 2026, and bolt A on top. Agents are the fastest-growing consumer of your HTML [Source 11](#source-11), and they cannot see your AVIF.

## The Mechanism

**Why A (byte-level) wins when humans on bad networks dominate.** The `next/image`

component serves device-correct WebP, prevents layout shift via intrinsic width/height, and lazy-loads off-screen images natively [Source 3](#source-3). On a flaky link, this matters: Kornel's observation that mobile bandwidth arrives in "laggy bursts rather than slowly" [Source 20](#source-20) means a 155 kB hero is a real LCP hit. Byte savings compound — Lara Hogan's point that images are "arguably the easiest big win" for page load time [Source 2](#source-2) still holds, and the v16 default of `minimumCacheTTL: 14400`

(4 hours, up from 60 s) reflects that revalidation cost was real money [Source 18](#source-18).

**Why B (text-level) wins when AI agents are reading your site.** LLMs are next-token predictors over text [Source 15](#source-15). Even multimodal models tokenize images through a vision encoder + projector into the same latent space as text [Source 1](#source-1)[Source 1](#source-1) — and IBM's own teams admit "text-ify everything" loses visual context [Source 12](#source-12), which is why hybrid multimodal RAG keeps text captions as the retrieval index even when the LLM can see the image [Source 12](#source-12). Translation: when an agent or RAG pipeline crawls your page, the `alt`

attribute *is* the image as far as retrieval is concerned. Docling's whole pitch for AI ingestion is converting unstructured assets into "clean, structured text that large language models can actually use" [Source 13](#source-13)[Source 14](#source-14). The Web Almanac is blunt that ~50% of images ship with empty or sub-10-character alt text [Source 10](#source-10) — that's a silent retrieval failure on every agent-driven query. Pick B as the default.

## The Migration Path

If you optimized for bytes and now need agents to actually understand your pages:

**Audit alt coverage.** Grep your codebase for`<Image`

and`<img`

and flag any whose`alt`

is empty, missing, or ends in`.jpg`

/`.png`

— the 8.5% filename-as-alt anti-pattern[Source 5](#source-5).**Replace filename alts with descriptive text.** Target 20–30 characters, the band the Almanac flags as balancing brevity and signal[Source 5](#source-5). For decorative-only images,`alt=""`

is correct — don't pad.**Co-locate machine-readable context.** Add`opengraph-image.tsx`

per route for agent crawlers that follow OG metadata[Source 16](#source-16)[Source 19](#source-19), and emit a`figcaption`

near content images so RAG chunking captures the caption with the surrounding paragraph[Source 13](#source-13).**Keep byte optimization, tighten its config.** Stay on`next/image`

with`remotePatterns`

locked down[Source 6](#source-6). If you're on Next 16, explicitly set`qualities`

and`imageSizes`

if you need more than the new`[75]`

default or the dropped`16w`

size[Source 18](#source-18).**For SVG, use it.** SVG carries semantic structure agents can parse[Source 10](#source-10), unlike raster — but if you serve user-uploaded SVG through`next/image`

, you must set`dangerouslyAllowSVG`

with a strict CSP and`contentDispositionType: 'attachment'`

[Source 4](#source-4).**For RAG-targeted content, consider Docling.** Convert PDFs/decks to structured Markdown so the*text representation*of every embedded image survives ingestion[Source 14](#source-14).

## CEMENT Brick

If you ship a page tuned only for byte-level image optimization in 2026, then your fastest-growing class of visitors — AI agents and RAG crawlers — will retrieve a blank where your image was, because every LLM-backed reader still resolves images through their textual representation (alt, caption, surrounding chunk) before any vision encoder is consulted [Source 1](#source-1)[Source 12](#source-12)[Source 12](#source-12), and a missing or filename-shaped alt collapses to zero signal in the embedding space [Source 5](#source-5).

## Sources

[What Are Vision Language Models? How AI Sees & Understands Images](https://www.youtube.com/watch?v=lOD_EE96jhM)[Optimizing Images | Designing for Performance](https://designingforperformance.com/optimizing-images/#mentor-other-image-creators)[Image Optimization](https://nextjs.org/docs/app/getting-started/images)[Image Legacy](https://nextjs.org/docs/pages/api-reference/components/image-legacy)- Engineering Docs
[Image](https://nextjs.org/docs/pages/api-reference/components/image)[How to create a static export of your Next.js application](https://nextjs.org/docs/app/guides/static-exports)[How to self-host your Next.js application](https://nextjs.org/docs/app/guides/self-hosting)- Engineering Docs
- Engineering Docs
[AI agents in 2025: Why agentic commerce isn't ready for Black Friday yet](https://www.youtube.com/watch?v=SdNRWJ-oqjY)[What is Multimodal RAG? Unlocking LLMs with Vector Databases](https://www.youtube.com/watch?v=anLahYrEFiQ)[Unlock Better RAG & AI Agents with Docling](https://www.youtube.com/watch?v=rrQHnibpXX8)[What Is Docling? Transforming Unstructured Data for RAG and AI](https://www.youtube.com/watch?v=zSA7ylHP6AY)[AI vs Human Thinking: How Large Language Models Really Work](https://www.youtube.com/watch?v=-ovM0daP6bw)[Metadata and OG images](https://nextjs.org/docs/app/getting-started/metadata-and-og-images)[images](https://nextjs.org/docs/pages/api-reference/config/next-config-js/images)[How to upgrade to version 16](https://nextjs.org/docs/app/guides/upgrading/version-16)[opengraph-image and twitter-image](https://nextjs.org/docs/app/api-reference/file-conventions/metadata/opengraph-image)[The present and potential future of progressive image rendering](https://jakearchibald.com/2025/present-and-future-of-progressive-image-rendering/)