# How to make your AI Agent 111x cheaper and 2.5x faster at data aggregation

> Source: <https://dev.to/olex_green/how-to-make-your-ai-agent-111x-cheaper-and-25x-faster-at-data-aggregation-ii7>
> Published: 2026-05-26 06:08:08+00:00

Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads.

The Benchmark Task

The challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt:

```
"Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight."
```

First, I ran the bare model, forcing it to read the entire JSON payload and do the maths internally.

I have to give credit where it is due: the fact that Gemini 3.5 Flash was able to parse 72,286 tokens of raw JSON and not make a single mistake with the fractions is an amazing achievement. It delivered an absolutely identical, mathematically precise answer: 78.44684210526316. No hallucinations.

But the cost of this achievement was astronomical:

```
Execution time: 38.89s
Total consumption: 72,361 tokens for a single request!
```

Next, I ran the exact same task through the Neonia MCP (Model Context Protocol) platform.

Instead of arbitrarily clogging the agent's context window with raw data, the Neonia infrastructure intercepted the intent and quickly delegated the heavy lifting to a Wasm-based JQ filter tool. The model didn't have to read the data; it just orchestrated the query.

The results for the exact same request:

```
Execution time: 15.54s (2.5x faster ⚡)
Total consumption: 650 tokens (111x cheaper)
```

The MCP server usage code (Rust):

``` js
    let url = "https://mcp.neonia.io/mcp?tools=neonia_web_json_fetch,neonia_data_jq_filter";
    let neonia_key = env::var("NEONIA_API_KEY").unwrap_or_default();

    // Configure the rmcp transport
    let mut config = rmcp::transport::streamable_http_client::StreamableHttpClientTransportConfig::with_uri(url);
    config.auth_header = Some(format!("Bearer {}", neonia_key));

    let transport = rmcp::transport::StreamableHttpClientTransport::from_config(config);

    let capabilities = ClientCapabilities::default();
    let impl_info = Implementation::new("rig-agent", "0.1.0");
    let client_info = ClientInfo::new(capabilities, impl_info);

    let client = client_info.serve(transport).await?;

    // Fetch Tools
    let tools = client.list_tools(Default::default()).await?.tools;

    // Initialize Rig's Native Gemini Client
    let gemini_client = gemini::Client::from_env().expect("Failed to initialize Gemini client");

    let agent = gemini_client.agent("gemini-3.5-flash")
        .preamble(
            "You are an autonomous agent equipped with Neonia's data processing tools.\n\
            ## Usage Guidelines\n\
            1. When asked to process JSON from a URL, ALWAYS use `neonia_web_json_fetch` first. It will securely store the file and return a `resource_uri` and a TypeScript schema of the data.\n\
            2. Once you have the `resource_uri` and the schema, use `neonia_data_jq_filter` to extract exactly what you need. Pass the `resource_uri` to the `resource_uri` parameter, and formulate a mathematically precise `jq_query` based on the schema.\n\
            3. Write correct JQ queries. Example JQ for math average: `.users | map(select(.age > 30 and .eyeColor == \"Green\")) | map(.weight) | if length > 0 then add / length else 0 end`.\n\
            4. Self-Correction: If a tool returns an error, evaluate your parameters, fix them, and try again."
        )
        .default_max_turns(5)
        .rmcp_tools(tools, client.peer().to_owned())
        .build();

    let target_url = "https://dummyjson.com/users?limit=0";
    let prompt_text = format!("Fetch the JSON from {}, then find all users who are strictly older than 30 ('age' > 30) AND whose 'eyeColor' is exactly 'Green'. Calculate the EXACT mathematical average of their 'weight'. Return ONLY the final average number, without any explanations.", target_url);

    let response = agent.prompt(&prompt_text).await?;
```

The Takeaway for AI Engineering

Frontier models are becoming phenomenally fast and smart with every new release. But the future of the industry is not about endlessly inflating the context window just because we can. Context bloat is a latency tax.

The Neonia MCP architecture proves that breaking the linear dependency between data volume and the cost of an LLM call is the only way to scale. By using a cohesive platform as a specialised backend for your autonomous agents, you can process megabytes of JSON for the exact same 1,000 tokens. This is true enterprise-grade optimisation.

I’m curious to hear from other engineers building agentic workflows: What are your thoughts on smart delegation of compute versus relying entirely on native LLM context windows? Let’s discuss in the comments.
