Reiner Pope – Chip design from the bottom up
Reiner Pope, CEO of AI chip startup MatX and former Google engineer, delivered a blackboard lecture explaining chip design from basic logic gates to the architectures of GPUs, TPUs, FPGAs, and the hum…
Reiner Pope, CEO of AI chip startup MatX and former Google engineer, delivered a blackboard lecture explaining chip design from basic logic gates to the architectures of GPUs, TPUs, FPGAs, and the hum…
CODA, a GPU kernel abstraction that reparameterizes memory-bound Transformer operations like normalization and activations to execute as GEMM-plus-epilogue programs, keeping data on-chip to reduce glo…
Running large language model inference servers like vLLM and TGI in production requires specialized observability because they behave differently from standard web services, with key metrics like late…
Practical guide for engineers and security teams evaluating whether to run large AI models locally, in private cloud, or via secure enterprise platforms. It argues that AI performance depends not only…
Cerebras Systems has secured a 750MW compute deal with OpenAI, positioning the company for its upcoming IPO as demand for fast token generation surges. The wafer-scale chip maker's speed advantages, p…
Shepherd Model Gateway (SMG) has disaggregated all CPU-bound workloads from GPU inference in large language model serving, moving tokenization, detokenization, and parsing into a dedicated Rust gatewa…
The article provides a step-by-step guide on self-hosting a Git frontend service using Gitea on a Debian server with Nginx. It covers setting up a PostgreSQL database for Gitea, downloading and instal…