WEKA software speeds long context AI inferencing on Oracle’s public cloud

wpnews.pro

cd /news/ai-infrastructure/weka-software-speeds-long-context-ai… · home › topics › ai-infrastructure › article

[ARTICLE · art-23880] src=blocksandfiles.com ↗ pub=2026-06-10T14:27Z topic=ai-infrastructure verified=true sentiment=↑ positive

WEKA software speeds long context AI inferencing on Oracle’s public cloud

WEKA's NeuralMesh and Augmented Memory Grid software delivered 10x higher token throughput, 10x more concurrent users, and 7x more tokens per GPU on Oracle Cloud Infrastructure compared to DRAM-only configurations, according to joint benchmark tests. The software extends GPU server memory for AI inferencing to external storage, enabling a nine-node OCI H100 cluster to support over 5,000 concurrent users with 100,000-token context windows, up from approximately 600 users without the software. The results demonstrate that eliminating memory bottlenecks can significantly reduce AI inference costs and improve GPU utilization for long-context and agentic workloads.

read3 min views18 publishedJun 10, 2026

WEKA NeuralMesh and Augmented Memory Grid software provides 10x higher token throughput, 10x more concurrent users served, and 7x more tokens per GPU when used with Oracle Cloud Infrastructure (OCI) than DRAM-only OCI.

WEKA’s Augmented Memory Grid enables AI models to extend GPU server memory for inferencing to the Neural Mesh’s external storage, using it as a KV Cache with microsecond latencies and multi-TBPs bandwidth, and providing up to additional petabytes of memory address space capacity. It supports Nvidia’s SX KV caching architecture. NeuralMesh is WEKA’s high-performance AI file system software. The joint test results were validated on a nine-node OCI bare-metal H100 cluster with 100,000-token context windows.

Pablo Selem, senior director, software development for OCI, said: “Enterprise AI workloads are pushing context windows and GPU utilization to new limits. These benchmarks show how WEKA’s NeuralMesh platform with Augmented Memory Grid on OCI helps remove memory bottlenecks so customers can support larger, more demanding inference workloads without simply adding more GPUs.”

WEKA says that, as inference demand grows, AI infrastructural inefficiencies have greater and greater effects. Every key-value (KV) cache eviction is effectively a tax: “on GPU cycles, latency, user experience, and the cost of every token served. For long-context and agentic workloads, where inputs routinely run to 100,000 tokens or more, that tax is not a rounding error. It is a direct hit on the unit economics of every organization running production AI.”

The OCI H100 cluster test set-up featured nine nodes, 72 GPUs, 100,000-token context windows, and thousands of concurrent users.

NeuralMesh with Augmented Memory Grid scaled past 5,000 concurrent users vs. about 600 for DRAM-only configurations. This eliminates the failure cliff that hits when cache saturates by expanding the active cache working set from 8.64 TiB of DRAM to 287 TiB of usable NVMe flash storage. In addition, more users per GPU means the same GPU investment stretches further. Overall 10x more concurrent users were served, without adding any more GPU compute and memory capacity.

NeuralMesh with Augmented Memory Grid reached approximately 2 million tokens/sec, compared to under 200,000 for the DRAM-only baseline; 10x higher token throughput.

NeuralMesh with Augmented Memory Grid served five billion tokens, compared to 700 million for the DRAM-only baseline, in a single one-hour, 2,400-user test.

For organizations running agentic workflows, DRAM saturation drains GPU capacity through constant recomputation, creating a direct hit on cost per token and ROI. With 7x more tokens served, the $/token cost is much lower. WEKA points out that, for product teams running real-time AI features, including search, summarization, code assist, and multi-turn agents, the throughput number determines the ceiling for how many users can be served, how fast features respond, and how much revenue the infrastructure can support. The 10x higher token throughput meant there was more output from every GPU in the cluster.

Using the WEKA software meant many more users could be supported, many more tokens processed and a significantly lower cost. WEKA CEO Liran Zvibel said: “Inference is bottlenecked by how much effective memory is available to GPUs. These results prove that AI token economics aren’t solved by hardware alone; they’re solved by eliminating the memory wall that has been the real ceiling on what existing hardware can do. NeuralMesh with Augmented Memory Grid running on OCI brings orders of magnitude more tokens to customers in an extremely cost-efficient way.”

OCI published the background, full benchmark methodology, system configuration, and results on its AI & Data Science blog.

NeuralMesh with Augmented Memory Grid is generally available to WEKA customers and on the Oracle Marketplace, with OCI as WEKA’s exclusive cloud launch partner. Organizations running long-context inference on OCI can deploy a validated, production-ready architecture today.

source & further reading

blocksandfiles.com — original article Kioxia launches first PCIe gen 5 liquid-cooled SSD How can you trust AI agents? Veeam can help Dancing the frantic flash fandango

~/api · this article 200

$curl api.wpnews.pro/v1/news/weka-software-speeds-lon…

Read original on blocksandfiles.com → www.blocksandfiles.com/file/2026/06/10/weka-soft…

mentioned entities

WEKA

Oracle Cloud Infrastructure

Nvidia

Pablo Selem

NeuralMesh

Augmented Memory Grid

OCI

H100

metadata

slugweka-software-speeds-long-context-ai-inferencing-on-oracles-public-cloud

topic#ai-infrastructure

secondary4 topics

sentimentpositive

canonicalblocksandfiles.com

navigation

← prevMassive AI Storage Demand Create…

next →Infrastructure Explained: Comput…

── more in #ai-infrastructure 4 stories · sorted by recency

siliconangle.com · 21 Jul · #ai-infrastructure

WekaIO revamps its AI data storage platform and unveils its first hardware for agentic workloads

blocksandfiles.com · 21 Jul · #ai-infrastructure

WEKA's big beast AI storage products

insideai.news · 29 Jul · #ai-infrastructure

ChipAgents Raises $60M to Automate Chip Design with AI Agents

insideai.news · 29 Jul · #ai-infrastructure

AI Credit Default Swaps Surge as Oracle, Nvidia Bond Insurance Costs Spike

── more on @weka 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required