Hi everyone,
I’d like to share an open-source resource I’ve been working on: a comprehensive, bilingual (English & Spanish) guide on Transformer architectures.
My goal was to create a bridge between the mathematical foundations of attention mechanisms and their practical implementation. The guide goes beyond the basics and dives deep into low-level mechanics, including:
Attention Dynamics: From scratch implementations to understanding attention collapse.
Context & Memory: Exploring KV-cache compression and long-context challenges.
Advanced Concepts: Grokking, optimization, and structural analysis.
The theoretical explanations are backed by reproducible code and interactive elements (like the TAF Agent framework I’ve been developing for browser-based LLM testing).
You can read it here: English: https://karlesmarin.github.io/transformers-guide/en/index.html (Nota: asegúrate de que esta URL existe o cámbiala por la correcta) Spanish: Cómo Atienden los Transformers
I’d love to hear feedback from the community, especially regarding the visualization of attention states and optimization techniques. Contributions or suggestions are more than welcome!
Cheers, Carles