Your MCP server can't take a file as an argument — here's why, and the fix

A developer building an MCP server for publishing HTML files discovered that large files cannot be passed as tool arguments because the model must emit the entire argument token by token, hitting output token limits. A 1.4 MB dashboard failed where a 5 KB test succeeded, revealing that tool arguments are bounded by the model's maximum output tokens. The fix replaces file contents with a file path reference, reducing the tool call to roughly 50 tokens regardless of file size.

I built an MCP server that publishes HTML files, and I hit a wall I haven't seen documented anywhere: you can't pass a large file as an MCP tool argument. Not "it's slow" or "it's awkward" — the model is physically Here's the failure, why it happens, and the one-line design change that fixes it. My agents Claude Code, mostly generate a lot of interactive HTML — dashboards with Chart.js, data-heavy reports, PRDs. I wanted them to publish those files to the web with one tool call, so I did the obvious thing first: server.tool "publish html", { html: z.string , title: z.string }, async { html, title } = { const url = await upload html, title ; return { content: { type: "text", text: url } }; } ; The agent generates the HTML, passes it as the html argument, the server uploads it. It demoed beautifully on a 5 KB "hello world" page. Then I tried it on a real artifact — a 1.4 MB dashboard with inlined data — and it fell apart. When a model calls a tool, the arguments aren't a file handle or a pointer. They are text the model emits, token by token , inside its response. A tool call's arguments are part of the model's output, which means they're bounded by the model's maximum output tokens. Do the math: 1 MB of HTML is roughly 250k–350k tokens. Typical max output is far below that. The model literally cannot finish "saying" the argument. In practice you get one of: And even when a file fits , you're paying output-token prices the expensive ones to make the model retype a file that already exists on disk, byte for byte, with a nonzero chance of it "fixing" something along the way. This isn't an MCP bug. It's the nature of tool calling: arguments are model output . Any MCP tool designed to receive bulk content as an argument has a The server runs locally over stdio. It has the same filesystem the agent is working in. So the tool takes a path : server.tool "publish file", { path: z.string .describe "Absolute path to the HTML file to publish" , title: z.string , }, async { path, title } = { const html = await fs.readFile path, "utf8" ; // server reads from disk const url = await uploadMultipart html, title ; // server does the upload return { content: { type: "text", text: url } }; } ; Now the model's tool call is ~50 tokens regardless of file size: { "path": "/Users/me/reports/q2-dashboard.html", "title": "Q2 Dashboard" } The agent never carries the bytes. It writes the file with its normal file tools which stream and don't have this constraint the same way , then hands the reference to the MCP server, which reads from disk and does a multipart upload itself. 1.4 MB or 14 MB — the model's job is the same size. This one design decision is the difference between a demo that works on toy files and a tool that's useful on real artifacts. If you're building an MCP server, audit every tool argument and ask: could this be big? If yes, take a reference instead: | Instead of accepting… | Accept… | |---|---| | file contents | a file path | | a big dataset | a path, URL, or query the server executes | | an image/binary blob | a path or URL | | "the whole document" to edit | a path + edit instructions | The corollary for outputs is the same: a tool that returns a huge payload fills the model's context window. Return a reference a URL, a path, a summary + handle and let the model fetch slices if it needs them. Local stdio servers are perfect for this pattern because they share a filesystem with the agent. For remote MCP servers you'd reach for the same idea with URLs or pre-signed uploads — anything but the bytes-in-arguments trap. The server is stelaspace-mcp on npm MIT — it publishes HTML files to StelaSpace https://stelaspace.com , which gives each artifact a permanent, sandboxed, access-controlled link I built it; free tier exists . One-line setup with Claude Code: claude mcp add stelaspace --scope user \ --env STELASPACE API KEY=ss sk ... \ -- npx -y stelaspace-mcp Here's a live dashboard published this way https://stelaspace.com/t/demo-team/s/examples/d/sprint-dashboard — 1 MB+ of interactive Chart.js HTML that no model could ever have passed as a tool argument. If you've hit other walls building MCP servers, I'd genuinely like to hear them — I'm collecting these gotchas.