{"slug": "how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation", "title": "How to make your AI Agent 111x cheaper and 2.5x faster at data aggregation", "summary": "A developer benchmarked Google's Gemini 3.5 Flash model on a data aggregation task involving 208 user objects and 72,286 tokens of JSON, finding the bare model delivered mathematically precise results but consumed 72,361 tokens and took 38.89 seconds. Running the same task through the Neonia MCP platform, which used a Wasm-based JQ filter tool to delegate data processing instead of loading raw data into the model's context window, reduced token consumption to 650 tokens (111x cheaper) and execution time to 15.54 seconds (2.5x faster).", "body_md": "Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads.\n\nThe Benchmark Task\n\nThe challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt:\n\n```\n\"Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight.\"\n```\n\nFirst, I ran the bare model, forcing it to read the entire JSON payload and do the maths internally.\n\nI have to give credit where it is due: the fact that Gemini 3.5 Flash was able to parse 72,286 tokens of raw JSON and not make a single mistake with the fractions is an amazing achievement. It delivered an absolutely identical, mathematically precise answer: 78.44684210526316. No hallucinations.\n\nBut the cost of this achievement was astronomical:\n\n```\nExecution time: 38.89s\nTotal consumption: 72,361 tokens for a single request!\n```\n\nNext, I ran the exact same task through the Neonia MCP (Model Context Protocol) platform.\n\nInstead of arbitrarily clogging the agent's context window with raw data, the Neonia infrastructure intercepted the intent and quickly delegated the heavy lifting to a Wasm-based JQ filter tool. The model didn't have to read the data; it just orchestrated the query.\n\nThe results for the exact same request:\n\n```\nExecution time: 15.54s (2.5x faster ⚡)\nTotal consumption: 650 tokens (111x cheaper)\n```\n\nThe MCP server usage code (Rust):\n\n``` js\n    let url = \"https://mcp.neonia.io/mcp?tools=neonia_web_json_fetch,neonia_data_jq_filter\";\n    let neonia_key = env::var(\"NEONIA_API_KEY\").unwrap_or_default();\n\n    // Configure the rmcp transport\n    let mut config = rmcp::transport::streamable_http_client::StreamableHttpClientTransportConfig::with_uri(url);\n    config.auth_header = Some(format!(\"Bearer {}\", neonia_key));\n\n    let transport = rmcp::transport::StreamableHttpClientTransport::from_config(config);\n\n    let capabilities = ClientCapabilities::default();\n    let impl_info = Implementation::new(\"rig-agent\", \"0.1.0\");\n    let client_info = ClientInfo::new(capabilities, impl_info);\n\n    let client = client_info.serve(transport).await?;\n\n    // Fetch Tools\n    let tools = client.list_tools(Default::default()).await?.tools;\n\n    // Initialize Rig's Native Gemini Client\n    let gemini_client = gemini::Client::from_env().expect(\"Failed to initialize Gemini client\");\n\n    let agent = gemini_client.agent(\"gemini-3.5-flash\")\n        .preamble(\n            \"You are an autonomous agent equipped with Neonia's data processing tools.\\n\\\n            ## Usage Guidelines\\n\\\n            1. When asked to process JSON from a URL, ALWAYS use `neonia_web_json_fetch` first. It will securely store the file and return a `resource_uri` and a TypeScript schema of the data.\\n\\\n            2. Once you have the `resource_uri` and the schema, use `neonia_data_jq_filter` to extract exactly what you need. Pass the `resource_uri` to the `resource_uri` parameter, and formulate a mathematically precise `jq_query` based on the schema.\\n\\\n            3. Write correct JQ queries. Example JQ for math average: `.users | map(select(.age > 30 and .eyeColor == \\\"Green\\\")) | map(.weight) | if length > 0 then add / length else 0 end`.\\n\\\n            4. Self-Correction: If a tool returns an error, evaluate your parameters, fix them, and try again.\"\n        )\n        .default_max_turns(5)\n        .rmcp_tools(tools, client.peer().to_owned())\n        .build();\n\n    let target_url = \"https://dummyjson.com/users?limit=0\";\n    let prompt_text = format!(\"Fetch the JSON from {}, then find all users who are strictly older than 30 ('age' > 30) AND whose 'eyeColor' is exactly 'Green'. Calculate the EXACT mathematical average of their 'weight'. Return ONLY the final average number, without any explanations.\", target_url);\n\n    let response = agent.prompt(&prompt_text).await?;\n```\n\nThe Takeaway for AI Engineering\n\nFrontier models are becoming phenomenally fast and smart with every new release. But the future of the industry is not about endlessly inflating the context window just because we can. Context bloat is a latency tax.\n\nThe Neonia MCP architecture proves that breaking the linear dependency between data volume and the cost of an LLM call is the only way to scale. By using a cohesive platform as a specialised backend for your autonomous agents, you can process megabytes of JSON for the exact same 1,000 tokens. This is true enterprise-grade optimisation.\n\nI’m curious to hear from other engineers building agentic workflows: What are your thoughts on smart delegation of compute versus relying entirely on native LLM context windows? Let’s discuss in the comments.", "url": "https://wpnews.pro/news/how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation", "canonical_source": "https://dev.to/olex_green/how-to-make-your-ai-agent-111x-cheaper-and-25x-faster-at-data-aggregation-ii7", "published_at": "2026-05-26 06:08:08+00:00", "updated_at": "2026-05-26 06:34:12.628550+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-infrastructure", "ai-tools", "ai-products"], "entities": ["Google", "Gemini 3.5 Flash", "Neonia", "MCP", "Model Context Protocol", "Wasm", "JQ"], "alternates": {"html": "https://wpnews.pro/news/how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation", "markdown": "https://wpnews.pro/news/how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation.md", "text": "https://wpnews.pro/news/how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation.txt", "jsonld": "https://wpnews.pro/news/how-to-make-your-ai-agent-111x-cheaper-and-2-5x-faster-at-data-aggregation.jsonld"}}