A comprehensive, bilingual guide to Transformers: From foundations to KV-cache compression & attention dynamics

Carles Marin released an open-source bilingual guide on Transformer architectures, covering attention dynamics, KV-cache compression, and advanced concepts like grokking. The resource includes reproducible code and interactive elements for practical learning.

Hi everyone, I’d like to share an open-source resource I’ve been working on: a comprehensive, bilingual English & Spanish guide on Transformer architectures. My goal was to create a bridge between the mathematical foundations of attention mechanisms and their practical implementation. The guide goes beyond the basics and dives deep into low-level mechanics, including: Attention Dynamics: From scratch implementations to understanding attention collapse. Context & Memory: Exploring KV-cache compression and long-context challenges. Advanced Concepts: Grokking, optimization, and structural analysis. The theoretical explanations are backed by reproducible code and interactive elements like the TAF Agent framework I’ve been developing for browser-based LLM testing . You can read it here: English: https://karlesmarin.github.io/transformers-guide/en/index.html https://www.google.com/search?q=https://karlesmarin.github.io/transformers-guide/en/index.html Nota: asegúrate de que esta URL existe o cámbiala por la correcta Spanish: Cómo Atienden los Transformers https://karlesmarin.github.io/transformers-guide/es/index.html I’d love to hear feedback from the community, especially regarding the visualization of attention states and optimization techniques. Contributions or suggestions are more than welcome Cheers, Carles