Sharded Inference of a 229B-Parameter Moe over the Internet at Interactive Speed

wpnews.pro

cd /news/large-language-models/sharded-inference-of-a-229b-paramete… · home › topics › large-language-models › article

[ARTICLE · art-47839] src=twitter.com ↗ pub=2026-07-04T11:04Z topic=large-language-models verified=true sentiment=· neutral

Sharded Inference of a 229B-Parameter Moe over the Internet at Interactive Speed

A new technical report details the sharded inference of a 229-billion-parameter mixture-of-experts model across five consumer GPUs in five countries over the public internet, achieving 12.6 tokens per second interactively and 194 tokens per second in batch mode, with cryptographic receipts on every request.

read1 min views1 publishedJul 4, 2026

Sharded Inference of a 229B-Parameter Moe over the Internet at Interactive Speed — Image: source

1/ We published our first technical report today. We ran a 229B model split across five consumer GPUs in five countries over the public internet and measured 12.6 tok/s interactive, 194 tok/s batched. With cryptographic receipts on every request. doi.org/10.5281/zenodo…

source & further reading

twitter.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/sharded-inference-of-a-2…

Read original on twitter.com → twitter.com/c0mputeAI/status/2073150789640421537

metadata

slugsharded-inference-of-a-229b-parameter-moe-over-the-internet-at-interactive-speed

topic#large-language-models

secondary2 topics

sentimentneutral

canonicaltwitter.com

navigation

← prevUS Air Force Engineer Charged Wi…

next →GPU Survivors: Can You Survive a…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 4 Jul · #large-language-models

GPU Survivors: Can You Survive a 1T Parameter Inference Run?

dev.to · 4 Jul · #large-language-models

Prompt Caching in Practice: The 5-Minute Cache and Workflow Design

dev.to · 4 Jul · #large-language-models

Solving the GPU Pinning Saga and Gemma's Meta-Commentary

dev.to · 4 Jul · #large-language-models

Make Any Website AI-Readable: Generating llms.txt Files with Python

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required