Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by @googlegemma on vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! ■ Leave a comment setup and command to run the model.
BoxAgnts Tool System (6) — Multi-Provider Adaptation and the Agent Query Loop