{"slug": "a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and", "title": "A Dummy's Guide to AMD GPU Issues on Linux - Understanding RDNA3, TLB Fences, and Kernel Parameters", "summary": "This article is a beginner-friendly guide to diagnosing and fixing AMD GPU issues on Linux, particularly for RDNA3 (RX 7000 series) cards. It explains common symptoms like system freezes and \"fence timeout\" errors, identifies a known kernel bug in versions 6.14-6.17 as a primary cause, and provides troubleshooting steps including specific kernel parameters and diagnostic commands.", "body_md": "A beginner-friendly guide to understanding and fixing AMD GPU crashes, freezes, and instability on Linux.\n- Common Symptoms\n- Understanding the Terminology\n- Diagnosing Your GPU Issues\n- Common AMD GPU Problems\n- Kernel Parameters Explained\n- Step-by-Step Troubleshooting\nYou might be experiencing AMD GPU issues if you see:\n- System freezes/crashes randomly\n- Black screens\n- \"GPU hung\" or \"fence timeout\" messages in logs\n- Display flickering or artifacts\n- Messages about \"overdrive\" or \"power management\"\n- Applications crash when using GPU acceleration\nRDNA/RDNA2/RDNA3: AMD's GPU architecture generations\n- RDNA: RX 5000 series (e.g., RX 5700 XT)\n- RDNA2: RX 6000 series (e.g., RX 6800 XT)\n- RDNA3: RX 7000 series (e.g., RX 7900 XTX, RX 7700 XT)\nNavi 10/21/23/31/32/33: Code names for specific GPU chips\n- Navi 32 = RX 7700 XT / 7800 XT\n- Navi 33 = RX 7600\n- Navi 31 = RX 7900 XTX / XT\nGFX Version: Internal GPU identifier (e.g., gfx1101 for RDNA3)\nDMA (Direct Memory Access): How the GPU accesses system memory without involving the CPU. Think of it as a direct highway between GPU and RAM.\nTLB (Translation Lookaside Buffer): A cache that translates memory addresses. Like a phone book for memory locations.\nFence Timeout: When the GPU promises to finish a task by a deadline but fails to do so. The system waits... and waits... and eventually gives up, causing a crash.\nTLB Fence Timeout: The specific problem where the GPU can't complete memory translation tasks in time. This is a known bug in RDNA3 GPUs on certain Linux kernels.\nIOMMU (Input-Output Memory Management Unit): Hardware that manages memory access for devices. Sometimes causes conflicts with AMD GPUs.\nSMU (System Management Unit): Firmware that controls GPU power, clocks, and thermal management.\nPower DPM (Dynamic Power Management): System that adjusts GPU clock speeds and voltage based on workload.\nOverdrive: AMD's term for overclocking features. When people say \"overdrive enabled,\" it usually just means the GPU can boost its clocks.\nAMDGPU: The open-source Linux kernel driver for modern AMD GPUs\nROCm: AMD's compute platform for GPU computing (like CUDA for Nvidia)\nMesa: The open-source graphics stack that implements OpenGL, Vulkan, etc.\nFirmware: Low-level software that runs on the GPU itself\n# Check kernel messages for GPU errors\nsudo dmesg | grep -i \"amdgpu\\|gpu\\|fence\\|timeout\" | tail -50\n# Check system logs\nsudo journalctl -b -0 --no-pager | grep -i \"amdgpu\\|gpu hung\\|fence\" | tail -50\nTLB Fence Issues (RDNA3 specific):\namdgpu_tlb_fence_work\ndma_fence_wait_timeout\nTrying to push to a killed entity\n→ This is a kernel bug, not a hardware problem\nPower Management Issues:\namdgpu: GPU recovery enabled\nruntime pm\ngfx_off\n→ Power features causing instability\nFirmware Mismatches:\nSMU driver if version not matched\n→ Driver and firmware versions don't match\nDisplay Issues:\nDC (Display Core)\nDMUB (Display Microcontroller)\n→ Display subsystem problems\n# Get GPU info\nlspci | grep -i vga\n# ROCm info (if installed)\nrocm-smi --showproductname\n# Check GFX version\ngrep \"GFX Version\" /var/log/Xorg.0.log\nProblem: GPU freezes, system hangs, \"fence timeout\" in logs\nAffected: Mainly RX 7000 series on kernels 6.14-6.17\nCause: Kernel bug in memory management\nSolution: Kernel parameters (see below)\nProblem: System freezes during idle or wake from sleep\nAffected: Most AMD GPUs\nCause: Aggressive power saving features\nSolution: Disable runtime PM and GFX off\nProblem: Random black screens, flickering\nAffected: Multi-monitor setups, high refresh rate\nCause: Display Core (DC) bugs\nSolution: DC-specific kernel parameters\nProblem: GPU performance drops, thermal warnings\nAffected: All GPUs with inadequate cooling\nCause: Poor airflow, dust, faulty firmware\nSolution: Physical cleaning, firmware update, custom fan curves\nKernel parameters are settings you pass to the Linux kernel at boot. They're added in /etc/default/grub\nin the GRUB_CMDLINE_LINUX_DEFAULT\nline.\namdgpu.tmz=0\n- What: Disables Trusted Memory Zone\n- Why: TMZ has bugs on RDNA3, causes freezes\n- When to use: RDNA3 GPUs with random crashes\namdgpu.sg_display=0\n- What: Disables scatter-gather for display\n- Why: Reduces DMA fence timeouts\n- When to use: Display issues, TLB fence timeouts\namdgpu.dcdebugmask=0x10\n- What: Disables certain Display Core debugging features\n- Why: DC debugging can cause hangs\n- When to use: Display-related freezes\niommu=soft\n- What: Uses software IOMMU instead of hardware\n- Why: Hardware IOMMU can conflict with AMD GPUs\n- When to use: DMA/fence timeout issues\namdgpu.gpu_recovery=1\n- What: Enables automatic GPU recovery after hangs\n- Why: GPU can reset itself instead of crashing system\n- When to use: Always recommended\namdgpu.gfx_off=0\n- What: Disables GFX power gating\n- Why: GFX off state causes crashes on some GPUs\n- When to use: Idle crashes, wake-from-sleep issues\namdgpu.runpm=0\nor amdgpu.runtime_pm=0\n- What: Disables runtime power management\n- Why: Runtime PM causes suspend/resume crashes\n- When to use: Sleep/wake issues\namdgpu.ppfeaturemask=0xffffffff\n- What: Enables all power play features\n- Why: Sometimes disabling features causes more problems\n- When to use: When conservative settings don't work\namdgpu.dc=0\n- What: Disables Display Core (uses legacy display code)\n- Why: DC has bugs, legacy is more stable\n- When to use: Last resort for display issues (loses features)\nEdit /etc/default/grub\n:\n# For RDNA3 TLB fence issues\nGRUB_CMDLINE_LINUX_DEFAULT=\"quiet splash amdgpu.tmz=0 amdgpu.sg_display=0 amdgpu.dcdebugmask=0x10 iommu=soft\"\n# For general stability\nGRUB_CMDLINE_LINUX_DEFAULT=\"quiet splash amdgpu.gpu_recovery=1 amdgpu.gfx_off=0 amdgpu.runpm=0\"\n# After editing, update GRUB:\nsudo update-grub\n-\nIdentify your GPU:\nlspci | grep VGA\n-\nCheck kernel version:\nuname -r\n-\nCheck for errors in logs:\nsudo dmesg | grep -i amdgpu | tail -50 sudo journalctl -b -0 | grep -i \"fence\\|timeout\" | tail -20\n-\nCheck driver version:\nmodinfo amdgpu | grep version\n-\nCheck firmware version:\nsudo dmesg | grep \"smu fw version\"\n-\nUpdate everything:\nsudo apt update && sudo apt upgrade sudo apt install linux-firmware\n-\nTry a different kernel:\n- Reboot and select an older kernel from GRUB menu\n- For RDNA3: kernel 6.11.x often more stable than 6.14+\n-\nAdd basic stability parameters:\nsudo nano /etc/default/grub # Add to GRUB_CMDLINE_LINUX_DEFAULT: amdgpu.gpu_recovery=1 amdgpu.gfx_off=0 sudo update-grub sudo reboot\n-\nFor TLB Fence Timeouts (RDNA3):\n# Add these parameters: amdgpu.tmz=0 amdgpu.sg_display=0 amdgpu.dcdebugmask=0x10 iommu=soft\n-\nFor Power Management Issues:\n# Add these parameters: amdgpu.runpm=0 amdgpu.gfx_off=0\n-\nFor Display Issues:\n# Try these one at a time: amdgpu.dcdebugmask=0x10 amdgpu.dc=0 # Last resort - loses features\n-\nCreate a modprobe config (alternative to kernel parameters):\nsudo nano /etc/modprobe.d/amdgpu.conf\nAdd:\noptions amdgpu gpu_recovery=1 options amdgpu gfx_off=0 options amdgpu tmz=0\nThen:\nsudo update-initramfs -u sudo reboot\nIf nothing else works:\n-\nTry the proprietary driver (AMDGPU-PRO):\n- Not recommended for gaming\n- Better for compute workloads\n- Download from AMD website\n-\nDowngrade to an older kernel:\n# Install older kernel sudo apt install linux-image-6.11.0-8-generic # Boot into it from GRUB menu\n-\nFile a bug report:\n- Check existing bugs: https://gitlab.freedesktop.org/drm/amd/-/issues\n- Include: dmesg output, GPU model, kernel version, reproduction steps\n# See active kernel parameters\ncat /proc/cmdline\n# Check specific amdgpu parameter\ncat /sys/module/amdgpu/parameters/gpu_recovery\n# Watch GPU clocks and temperature (ROCm)\nwatch -n 1 rocm-smi\n# Check power management state\ncat /sys/class/drm/card0/device/power_dpm_force_performance_level\n# Check current GPU clocks\ncat /sys/class/drm/card0/device/pp_dpm_sclk\ncat /sys/class/drm/card0/device/pp_dpm_mclk\n# OpenGL stress test\nglxgears -fullscreen\n# Vulkan stress test\nvkcube\n# Compute test (if ROCm installed)\nrocm-smi --showtemp --showpower --showclocks\nMyth: \"Overdrive causes crashes\"\nReality: Overdrive is just AMD's term for boost clocks. The message is usually harmless.\nMyth: \"AMD GPUs don't work on Linux\"\nReality: They work great! RDNA3 just has some kernel bugs being fixed.\nMyth: \"You need proprietary drivers\"\nReality: The open-source AMDGPU driver is excellent and recommended.\nMyth: \"Lowering clocks fixes stability\"\nReality: Usually doesn't help. Most issues are driver/kernel bugs, not hardware limits.\nMyth: \"More power management = better\"\nReality: Aggressive power saving often causes more crashes than it's worth.\n- Check logs first - 90% of diagnosis is reading dmesg/journalctl\n- Search existing issues - Your problem is probably known\n- Provide details:\n- GPU model (exact SKU)\n- Kernel version\n- Driver version\n- Full dmesg output showing the error\n- Steps to reproduce\n- AMD GPU Linux Kernel Driver: https://gitlab.freedesktop.org/drm/amd\n- Mesa Graphics: https://gitlab.freedesktop.org/mesa/mesa\n- ROCm: https://github.com/RadeonOpenCompute/ROCm\n- Arch Wiki (excellent resource): https://wiki.archlinux.org/title/AMDGPU\n- Ubuntu AMD GPU Guide: https://help.ubuntu.com/community/RadeonDriver\nNote: This guide is based on real-world troubleshooting of RDNA3 GPU issues on Linux. Always back up your system before making changes, and remember that kernel/driver bugs get fixed over time - sometimes just waiting for updates is the best solution.\nDisclaimer: Information provided is for educational purposes. The author is not responsible for any system instability or data loss. Always maintain backups and test changes carefully.\nGenerated by Claude Code - Verify all technical information before applying to production systems.", "url": "https://wpnews.pro/news/a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and", "canonical_source": "https://gist.github.com/danielrosehill/6a531b079906f160911a87dea50e1507", "published_at": "2025-11-23 16:08:49+00:00", "updated_at": "2026-05-22 17:38:25.021848+00:00", "lang": "en", "topics": ["hardware", "open-source", "semiconductor"], "entities": ["AMD", "RDNA3", "RDNA2", "RDNA", "RX 7900 XTX", "RX 7700 XT", "Navi 31", "Navi 32"], "alternates": {"html": "https://wpnews.pro/news/a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and", "markdown": "https://wpnews.pro/news/a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and.md", "text": "https://wpnews.pro/news/a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and.txt", "jsonld": "https://wpnews.pro/news/a-dummy-s-guide-to-amd-gpu-issues-on-linux-understanding-rdna3-tlb-fences-and.jsonld"}}