21:33
2026-06-11
twitter.com
artificial-intelligence
Local AI: 775 tok/s, DiffusionGemma (BF16) on Nvidia RTX 6000 Pro
A developer achieved 775 tokens per second running the full BF16 DiffusionGemma model on an Nvidia RTX 6000 Pro using a Red Hat fork of vLLM, demonstrating extremely fast local AI inference at short cโฆ