Show HN: Landmark AI and ML research explained, redrawn, animated

wpnews.pro

Interactive, animated, visual explainers of landmark AI & ML papers — the systems and ideas behind the models you use, redrawn and made legible. Free and open.

Browse all 100 explainers · Guided reading tracks Attention Is All You Need FlashAttention PagedAttention (vLLM)Megatron-LM DeepSeek-R1 GPT-3: Language Models are Few-Shot Learners ZeRO: Zero Redundancy Optimizer Mixtral of Experts Training Compute-Optimal Large Language Models Mamba: Linear-Time Sequence Modeling with Selective State Spaces BERT: Pre-training of Deep Bidirectional Transformers DeepSeek-V3 Qwen3 OLMo 2 MiniMax-01 Gemma 4 Scaling Laws for Neural Language Models Adam: A Method for Stochastic Optimization Deep Residual Learning for Image Recognition Denoising Diffusion Probabilistic Models Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity LoRA: Low-Rank Adaptation of Large Language Models GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism GSPMD: General and Scalable Parallelization for ML Computation Graphs Pathways: Asynchronous Distributed Dataflow for ML Ring Attention with Blockwise Transformers for Near-Infinite Context Efficiently Scaling Transformer Inference Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Fast Inference from Transformers via Speculative Decoding Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Training language models to follow instructions with human feedback Direct Preference Optimization: Your Language Model is Secretly a Reward Model DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Constitutional AI: Harmlessness from AI Feedback DAPO: An Open-Source LLM Reinforcement Learning System at Scale Tree of Thoughts: Deliberate Problem Solving with Large Language Models ReAct: Synergizing Reasoning and Acting in Language Models FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration RoFormer: Enhanced Transformer with Rotary Position Embedding An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Learning Transferable Visual Models From Natural Language Supervision High-Resolution Image Synthesis with Latent Diffusion Models Scalable Diffusion Models with Transformers Robust Speech Recognition via Large-Scale Weak Supervision Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Group Sequence Policy Optimization DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints YaRN: Efficient Context Window Extension of Large Language Models Efficient Streaming Language Models with Attention Sinks Generative Adversarial Networks Segment Anything Visual Instruction Tuning s1: Simple test-time scaling Tülu 3: Pushing Frontiers in Open Language Model Post-Training Let's Verify Step by Step Self-Consistency Improves Chain of Thought Reasoning in Language Models Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks SWE-bench: Can Language Models Resolve Real-World GitHub Issues?The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits KAN: Kolmogorov–Arnold Networks Differential Transformer Mixture-of-Depths: Dynamically allocating compute in transformer-based language models RWKV: Reinventing RNNs for the Transformer Era Titans: Learning to Memorize at Test Time Byte Latent Transformer: Patches Scale Better Than Tokens The Llama 3 Herd of Models Mistral 7B Phi-4 Technical Report FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Flow Matching for Generative Modeling Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Rewarding Doubt: Calibrated Confidence Expression of LLMs Why Language Models Hallucinate τ-bench: Tool-Agent-User Interaction in Real-World Domains ToolRL: Reward is All Tool Learning Needs Group-in-Group Policy Optimization for LLM Agent Training MiniMax-M1: Scaling Test-Time Compute with Lightning Attention ProRL: Prolonged RL Expands Reasoning Boundaries The Entropy Mechanism of RL for Reasoning Language Models Spurious Rewards: Rethinking Training Signals in RLVR GenPRM: Generative Process Reward Models From Hard Refusals to Safe-Completions Proximal Policy Optimization Algorithms Efficiently Modeling Long Sequences with Structured State Spaces Auto-Encoding Variational Bayes Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Toolformer: Language Models Can Teach Themselves to Use Tools GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Muon is Scalable for LLM Training Consistency Models

source & further reading

research.rudrite.com — original article Voyager: An Open-Ended Embodied Agent with Large Language Models — interactive visual explainer | Rudrite Research Agent Workflow Memory — interactive visual explainer | Rudrite Research ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs — interactive visual explainer | Rudrite Research

Show HN: Landmark AI and ML research explained, redrawn, animated

Run your AI side-project on zahid.host