cd /news/large-language-models/how-to-make-your-ai-agent-111x-cheap… · home topics large-language-models article
[ARTICLE · art-14131] src=dev.to pub= topic=large-language-models verified=true sentiment=↑ positive

How to make your AI Agent 111x cheaper and 2.5x faster at data aggregation

A developer benchmarked Google's Gemini 3.5 Flash model on a data aggregation task involving 208 user objects and 72,286 tokens of JSON, finding the bare model delivered mathematically precise results but consumed 72,361 tokens and took 38.89 seconds. Running the same task through the Neonia MCP platform, which used a Wasm-based JQ filter tool to delegate data processing instead of loading raw data into the model's context window, reduced token consumption to 650 tokens (111x cheaper) and execution time to 15.54 seconds (2.5x faster).

read3 min publishedMay 26, 2026

Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads.

The Benchmark Task

The challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt:

"Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight."

First, I ran the bare model, forcing it to read the entire JSON payload and do the maths internally.

I have to give credit where it is due: the fact that Gemini 3.5 Flash was able to parse 72,286 tokens of raw JSON and not make a single mistake with the fractions is an amazing achievement. It delivered an absolutely identical, mathematically precise answer: 78.44684210526316. No hallucinations.

But the cost of this achievement was astronomical:

Execution time: 38.89s
Total consumption: 72,361 tokens for a single request!

Next, I ran the exact same task through the Neonia MCP (Model Context Protocol) platform.

Instead of arbitrarily clogging the agent's context window with raw data, the Neonia infrastructure intercepted the intent and quickly delegated the heavy lifting to a Wasm-based JQ filter tool. The model didn't have to read the data; it just orchestrated the query.

The results for the exact same request:

Execution time: 15.54s (2.5x faster ⚡)
Total consumption: 650 tokens (111x cheaper)

The MCP server usage code (Rust):

    let url = "https://mcp.neonia.io/mcp?tools=neonia_web_json_fetch,neonia_data_jq_filter";
    let neonia_key = env::var("NEONIA_API_KEY").unwrap_or_default();

    // Configure the rmcp transport
    let mut config = rmcp::transport::streamable_http_client::StreamableHttpClientTransportConfig::with_uri(url);
    config.auth_header = Some(format!("Bearer {}", neonia_key));

    let transport = rmcp::transport::StreamableHttpClientTransport::from_config(config);

    let capabilities = ClientCapabilities::default();
    let impl_info = Implementation::new("rig-agent", "0.1.0");
    let client_info = ClientInfo::new(capabilities, impl_info);

    let client = client_info.serve(transport).await?;

    // Fetch Tools
    let tools = client.list_tools(Default::default()).await?.tools;

    // Initialize Rig's Native Gemini Client
    let gemini_client = gemini::Client::from_env().expect("Failed to initialize Gemini client");

    let agent = gemini_client.agent("gemini-3.5-flash")
        .preamble(
            "You are an autonomous agent equipped with Neonia's data processing tools.\n\
            ## Usage Guidelines\n\
            1. When asked to process JSON from a URL, ALWAYS use `neonia_web_json_fetch` first. It will securely store the file and return a `resource_uri` and a TypeScript schema of the data.\n\
            2. Once you have the `resource_uri` and the schema, use `neonia_data_jq_filter` to extract exactly what you need. Pass the `resource_uri` to the `resource_uri` parameter, and formulate a mathematically precise `jq_query` based on the schema.\n\
            3. Write correct JQ queries. Example JQ for math average: `.users | map(select(.age > 30 and .eyeColor == \"Green\")) | map(.weight) | if length > 0 then add / length else 0 end`.\n\
            4. Self-Correction: If a tool returns an error, evaluate your parameters, fix them, and try again."
        )
        .default_max_turns(5)
        .rmcp_tools(tools, client.peer().to_owned())
        .build();

    let target_url = "https://dummyjson.com/users?limit=0";
    let prompt_text = format!("Fetch the JSON from {}, then find all users who are strictly older than 30 ('age' > 30) AND whose 'eyeColor' is exactly 'Green'. Calculate the EXACT mathematical average of their 'weight'. Return ONLY the final average number, without any explanations.", target_url);

    let response = agent.prompt(&prompt_text).await?;

The Takeaway for AI Engineering

Frontier models are becoming phenomenally fast and smart with every new release. But the future of the industry is not about endlessly inflating the context window just because we can. Context bloat is a latency tax.

The Neonia MCP architecture proves that breaking the linear dependency between data volume and the cost of an LLM call is the only way to scale. By using a cohesive platform as a specialised backend for your autonomous agents, you can process megabytes of JSON for the exact same 1,000 tokens. This is true enterprise-grade optimisation.

I’m curious to hear from other engineers building agentic workflows: What are your thoughts on smart delegation of compute versus relying entirely on native LLM context windows? Let’s discuss in the comments.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-make-your-ai-…] indexed:0 read:3min 2026-05-26 ·