Show HN: Landmark AI and ML research explained, redrawn, animated

Rudrite Research launched a free, open platform offering interactive, animated visual explainers of landmark AI and ML papers, including Attention Is All You Need, GPT-3, and FlashAttention, to make frontier research legible. The platform features over 100 explainers and guided reading tracks for students, researchers, and practitioners.

Rudrite Research — the frontier, made legible Interactive, animated, visual explainers of landmark AI & ML papers — the systems and ideas behind the models you use, redrawn and made legible. Free and open. Browse all 100 explainers /library · Guided reading tracks /tracks Attention Is All You Need /attention FlashAttention /flash-attention PagedAttention vLLM /paged-attention Megatron-LM /megatron-lm DeepSeek-R1 /deepseek-r1 GPT-3: Language Models are Few-Shot Learners /gpt-3 ZeRO: Zero Redundancy Optimizer /zero Mixtral of Experts /mixtral Training Compute-Optimal Large Language Models /chinchilla Mamba: Linear-Time Sequence Modeling with Selective State Spaces /mamba BERT: Pre-training of Deep Bidirectional Transformers /bert DeepSeek-V3 /deepseek-v3 Qwen3 /qwen3 OLMo 2 /olmo-2 MiniMax-01 /minimax-01 Gemma 4 /gemma-4 Scaling Laws for Neural Language Models /scaling-laws Adam: A Method for Stochastic Optimization /adam Deep Residual Learning for Image Recognition /resnet Denoising Diffusion Probabilistic Models /ddpm Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity /switch-transformers LoRA: Low-Rank Adaptation of Large Language Models /lora GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism /gpipe GSPMD: General and Scalable Parallelization for ML Computation Graphs /gspmd Pathways: Asynchronous Distributed Dataflow for ML /pathways Ring Attention with Blockwise Transformers for Near-Infinite Context /ring-attention Efficiently Scaling Transformer Inference /scaling-inference Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving /mooncake Fast Inference from Transformers via Speculative Decoding /speculative-decoding Chain-of-Thought Prompting Elicits Reasoning in Large Language Models /chain-of-thought Training language models to follow instructions with human feedback /instructgpt Direct Preference Optimization: Your Language Model is Secretly a Reward Model /dpo DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models /deepseekmath Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters /test-time-compute Constitutional AI: Harmlessness from AI Feedback /constitutional-ai DAPO: An Open-Source LLM Reinforcement Learning System at Scale /dapo Tree of Thoughts: Deliberate Problem Solving with Large Language Models /tree-of-thoughts ReAct: Synergizing Reasoning and Acting in Language Models /react FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision /flash-attention-3 Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality /mamba-2 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model /deepseek-v2 EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty /eagle AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration /awq RoFormer: Enhanced Transformer with Rotary Position Embedding /rope An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale /vision-transformer Learning Transferable Visual Models From Natural Language Supervision /clip High-Resolution Image Synthesis with Latent Diffusion Models /latent-diffusion Scalable Diffusion Models with Transformers /dit Robust Speech Recognition via Large-Scale Weak Supervision /whisper Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention /native-sparse-attention Group Sequence Policy Optimization /gspo DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving /distserve CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion /cacheblend GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding /gshard GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints /gqa YaRN: Efficient Context Window Extension of Large Language Models /yarn Efficient Streaming Language Models with Attention Sinks /streaming-llm Generative Adversarial Networks /gan Segment Anything /segment-anything Visual Instruction Tuning /llava s1: Simple test-time scaling /s1 Tülu 3: Pushing Frontiers in Open Language Model Post-Training /tulu-3 Let's Verify Step by Step /lets-verify Self-Consistency Improves Chain of Thought Reasoning in Language Models /self-consistency Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks /rag SWE-bench: Can Language Models Resolve Real-World GitHub Issues? /swe-bench The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits /bitnet KAN: Kolmogorov–Arnold Networks /kan Differential Transformer /differential-transformer Mixture-of-Depths: Dynamically allocating compute in transformer-based language models /mixture-of-depths RWKV: Reinventing RNNs for the Transformer Era /rwkv Titans: Learning to Memorize at Test Time /titans Byte Latent Transformer: Patches Scale Better Than Tokens /byte-latent-transformer The Llama 3 Herd of Models /llama-3 Mistral 7B /mistral-7b Phi-4 Technical Report /phi-4 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning /flash-attention-2 Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads /medusa Scaling Rectified Flow Transformers for High-Resolution Image Synthesis /stable-diffusion-3 Flow Matching for Generative Modeling /flow-matching Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty /rlcr Rewarding Doubt: Calibrated Confidence Expression of LLMs /rewarding-doubt Why Language Models Hallucinate /why-llms-hallucinate τ-bench: Tool-Agent-User Interaction in Real-World Domains /tau-bench ToolRL: Reward is All Tool Learning Needs /toolrl Group-in-Group Policy Optimization for LLM Agent Training /gigpo MiniMax-M1: Scaling Test-Time Compute with Lightning Attention /cispo ProRL: Prolonged RL Expands Reasoning Boundaries /prorl The Entropy Mechanism of RL for Reasoning Language Models /entropy-mechanism Spurious Rewards: Rethinking Training Signals in RLVR /spurious-rewards GenPRM: Generative Process Reward Models /genprm From Hard Refusals to Safe-Completions /safe-completions Proximal Policy Optimization Algorithms /ppo Efficiently Modeling Long Sequences with Structured State Spaces /s4 Auto-Encoding Variational Bayes /vae Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer /t5 Toolformer: Language Models Can Teach Themselves to Use Tools /toolformer GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers /gptq Muon is Scalable for LLM Training /muon Consistency Models /consistency-models