How to make your AI Agent 111x cheaper and 2.5x faster at data aggregation

A developer benchmarked Google's Gemini 3.5 Flash model on a data aggregation task involving 208 user objects and 72,286 tokens of JSON, finding the bare model delivered mathematically precise results but consumed 72,361 tokens and took 38.89 seconds. Running the same task through the Neonia MCP platform, which used a Wasm-based JQ filter tool to delegate data processing instead of loading raw data into the model's context window, reduced token consumption to 650 tokens (111x cheaper) and execution time to 15.54 seconds (2.5x faster).

Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads. The Benchmark Task The challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt: "Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight." First, I ran the bare model, forcing it to read the entire JSON payload and do the maths internally. I have to give credit where it is due: the fact that Gemini 3.5 Flash was able to parse 72,286 tokens of raw JSON and not make a single mistake with the fractions is an amazing achievement. It delivered an absolutely identical, mathematically precise answer: 78.44684210526316. No hallucinations. But the cost of this achievement was astronomical: Execution time: 38.89s Total consumption: 72,361 tokens for a single request Next, I ran the exact same task through the Neonia MCP Model Context Protocol platform. Instead of arbitrarily clogging the agent's context window with raw data, the Neonia infrastructure intercepted the intent and quickly delegated the heavy lifting to a Wasm-based JQ filter tool. The model didn't have to read the data; it just orchestrated the query. The results for the exact same request: Execution time: 15.54s 2.5x faster ⚡ Total consumption: 650 tokens 111x cheaper The MCP server usage code Rust : js let url = "https://mcp.neonia.io/mcp?tools=neonia web json fetch,neonia data jq filter"; let neonia key = env::var "NEONIA API KEY" .unwrap or default ; // Configure the rmcp transport let mut config = rmcp::transport::streamable http client::StreamableHttpClientTransportConfig::with uri url ; config.auth header = Some format "Bearer {}", neonia key ; let transport = rmcp::transport::StreamableHttpClientTransport::from config config ; let capabilities = ClientCapabilities::default ; let impl info = Implementation::new "rig-agent", "0.1.0" ; let client info = ClientInfo::new capabilities, impl info ; let client = client info.serve transport .await?; // Fetch Tools let tools = client.list tools Default::default .await?.tools; // Initialize Rig's Native Gemini Client let gemini client = gemini::Client::from env .expect "Failed to initialize Gemini client" ; let agent = gemini client.agent "gemini-3.5-flash" .preamble "You are an autonomous agent equipped with Neonia's data processing tools.\n\ Usage Guidelines\n\ 1. When asked to process JSON from a URL, ALWAYS use neonia web json fetch first. It will securely store the file and return a resource uri and a TypeScript schema of the data.\n\ 2. Once you have the resource uri and the schema, use neonia data jq filter to extract exactly what you need. Pass the resource uri to the resource uri parameter, and formulate a mathematically precise jq query based on the schema.\n\ 3. Write correct JQ queries. Example JQ for math average: .users | map select .age 30 and .eyeColor == \"Green\" | map .weight | if length 0 then add / length else 0 end .\n\ 4. Self-Correction: If a tool returns an error, evaluate your parameters, fix them, and try again." .default max turns 5 .rmcp tools tools, client.peer .to owned .build ; let target url = "https://dummyjson.com/users?limit=0"; let prompt text = format "Fetch the JSON from {}, then find all users who are strictly older than 30 'age' 30 AND whose 'eyeColor' is exactly 'Green'. Calculate the EXACT mathematical average of their 'weight'. Return ONLY the final average number, without any explanations.", target url ; let response = agent.prompt &prompt text .await?; The Takeaway for AI Engineering Frontier models are becoming phenomenally fast and smart with every new release. But the future of the industry is not about endlessly inflating the context window just because we can. Context bloat is a latency tax. The Neonia MCP architecture proves that breaking the linear dependency between data volume and the cost of an LLM call is the only way to scale. By using a cohesive platform as a specialised backend for your autonomous agents, you can process megabytes of JSON for the exact same 1,000 tokens. This is true enterprise-grade optimisation. I’m curious to hear from other engineers building agentic workflows: What are your thoughts on smart delegation of compute versus relying entirely on native LLM context windows? Let’s discuss in the comments.