{"slug": "flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes", "title": "FLUX.1-schnell on 8GB VRAM (AMD, no CUDA): the GGUF format mismatch that wastes hours", "summary": "A developer discovered that FLUX.1-schnell GGUF models from city96's repository fail to load in stable-diffusion.cpp on AMD RX 580 8GB via Vulkan, while leejet's builds work. The error message 'new_sd_ctx_t failed' misleads users into thinking it's a VRAM issue rather than a quantization-source mismatch. Switching to leejet's GGUF files and using specific offloading flags (CLIP on VRAM, T5XXL and VAE on CPU, VAE tiling enabled) enables inference on low-VRAM AMD cards.", "body_md": "Sharing this because it cost me real time and I haven’t seen it documented clearly anywhere: if you’re trying to run FLUX.1-schnell through stable-diffusion.cpp (the C++ inference engine, not ComfyUI) on a low-VRAM AMD card via Vulkan, GGUF source matters and the error message doesn’t tell you why.\n\nThe setup: AMD RX 580 8GB (Polaris/GCN4, 2017 card — no ROCm support, no CUDA obviously), running stable-diffusion.cpp compiled with -DGGML_VULKAN=ON, no DirectML, no cloud.\n\nThe gotcha:\n\ncity96’s FLUX GGUF builds on HF (the ones most tutorials link, since they’re the most popular) — only work inside ComfyUI with the ComfyUI-GGUF custom node. They will NOT load in sd-server / the stable-diffusion.cpp CLI.\n\nleejet’s FLUX GGUF builds — these are the ones built for stable-diffusion.cpp specifically and actually load.\n\nUsing a city96 file in sd-server just gives:\n\n[ERROR] main.cpp:92 - new_sd_ctx_t failed\n\nNo further explanation, no hint that it’s a packaging/quantization-method mismatch rather than a VRAM or flag problem. I spent a while assuming it was a memory issue before realizing the file itself was the wrong build.\n\nOnce I switched to leejet’s FLUX.1-schnell-gguf repo, it loaded fine. For an 8GB card, the practical split that works:\n\nDiffusion model on VRAM (~6.5GB for q4_k)\n\nCLIP_L on VRAM (~235MB)\n\nT5XXL and VAE offloaded to system RAM (–clip-on-cpu --vae-on-cpu)\n\n–vae-tiling is NOT optional — without it, VAE decode OOMs even with everything else offloaded correctly.\n\nFull command and timing breakdown (T5XXL conditioning ~11s, sampling ~14min at 4 steps/1024x1024, VAE decode ~40s) here if useful: [GitHub - aivisionslab-studios/rx580-local-ai-guide: Complete guide to running local AI on AMD RX 580 8GB via Vulkan — llama.cpp, Ollama, OpenWebUI, Stable Diffusion. No CUDA. No cloud. Free. · GitHub](https://github.com/aivisionslab-studios/rx580-local-ai-guide)\n\nPosting mainly because “GGUF doesn’t load” is a generic enough error that it’s easy to misdiagnose as a VRAM or driver problem instead of a quantization-source mismatch. Curious if anyone’s hit the same wall with other GGUF-quantized diffusion models from different sources.", "url": "https://wpnews.pro/news/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes", "canonical_source": "https://discuss.huggingface.co/t/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes-hours/177010#post_1", "published_at": "2026-06-21 02:20:14+00:00", "updated_at": "2026-06-21 02:41:45.785480+00:00", "lang": "en", "topics": ["machine-learning", "ai-tools", "developer-tools"], "entities": ["AMD", "FLUX.1-schnell", "stable-diffusion.cpp", "GGUF", "city96", "leejet", "ComfyUI", "Vulkan"], "alternates": {"html": "https://wpnews.pro/news/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes", "markdown": "https://wpnews.pro/news/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes.md", "text": "https://wpnews.pro/news/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes.txt", "jsonld": "https://wpnews.pro/news/flux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes.jsonld"}}