GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

A developer implemented a full Transformer with KV cache on an FPGA, achieving over 56,000 tokens per second at only 80 MHz, without using a GPU or CPU. The design was created gate by gate as a custom digital integrated circuit, demonstrating extreme efficiency for AI inference.

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. No GPU. No CPU Just pure digital silicon running @karpathy https://x.com/karpathy microGPT, spelling out names on a GPT 👇00:00