FLUX.1-schnell on 8GB VRAM (AMD, no CUDA): the GGUF format mismatch that wastes hours

wpnews.pro

cd /news/machine-learning/flux-1-schnell-on-8gb-vram-amd-no-cu… · home › topics › machine-learning › article

[ARTICLE · art-35270] src=discuss.huggingface.co ↗ pub=2026-06-21T02:20Z topic=machine-learning verified=true sentiment=↓ negative

FLUX.1-schnell on 8GB VRAM (AMD, no CUDA): the GGUF format mismatch that wastes hours

A developer discovered that FLUX.1-schnell GGUF models from city96's repository fail to load in stable-diffusion.cpp on AMD RX 580 8GB via Vulkan, while leejet's builds work. The error message 'new_sd_ctx_t failed' misleads users into thinking it's a VRAM issue rather than a quantization-source mismatch. Switching to leejet's GGUF files and using specific offloading flags (CLIP on VRAM, T5XXL and VAE on CPU, VAE tiling enabled) enables inference on low-VRAM AMD cards.

read2 min views1 publishedJun 21, 2026

Sharing this because it cost me real time and I haven’t seen it documented clearly anywhere: if you’re trying to run FLUX.1-schnell through stable-diffusion.cpp (the C++ inference engine, not ComfyUI) on a low-VRAM AMD card via Vulkan, GGUF source matters and the error message doesn’t tell you why.

The setup: AMD RX 580 8GB (Polaris/GCN4, 2017 card — no ROCm support, no CUDA obviously), running stable-diffusion.cpp compiled with -DGGML_VULKAN=ON, no DirectML, no cloud.

The gotcha:

city96’s FLUX GGUF builds on HF (the ones most tutorials link, since they’re the most popular) — only work inside ComfyUI with the ComfyUI-GGUF custom node. They will NOT load in sd-server / the stable-diffusion.cpp CLI.

leejet’s FLUX GGUF builds — these are the ones built for stable-diffusion.cpp specifically and actually load.

Using a city96 file in sd-server just gives:

[ERROR] main.cpp:92 - new_sd_ctx_t failed

No further explanation, no hint that it’s a packaging/quantization-method mismatch rather than a VRAM or flag problem. I spent a while assuming it was a memory issue before realizing the file itself was the wrong build.

Once I switched to leejet’s FLUX.1-schnell-gguf repo, it loaded fine. For an 8GB card, the practical split that works:

Diffusion model on VRAM (~6.5GB for q4_k)

CLIP_L on VRAM (~235MB)

T5XXL and VAE offloaded to system RAM (–clip-on-cpu --vae-on-cpu)

–vae-tiling is NOT optional — without it, VAE decode OOMs even with everything else offloaded correctly.

Full command and timing breakdown (T5XXL conditioning ~11s, sampling ~14min at 4 steps/1024x1024, VAE decode ~40s) here if useful: GitHub - aivisionslab-studios/rx580-local-ai-guide: Complete guide to running local AI on AMD RX 580 8GB via Vulkan — llama.cpp, Ollama, OpenWebUI, Stable Diffusion. No CUDA. No cloud. Free. · GitHub

Posting mainly because “GGUF doesn’t load” is a generic enough error that it’s easy to misdiagnose as a VRAM or driver problem instead of a quantization-source mismatch. Curious if anyone’s hit the same wall with other GGUF-quantized diffusion models from different sources.

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/flux-1-schnell-on-8gb-vr…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/flux-1-schnell-on-8gb-v…

mentioned entities

AMD

FLUX.1-schnell

stable-diffusion.cpp

GGUF

city96

leejet

ComfyUI

Vulkan

metadata

slugflux-1-schnell-on-8gb-vram-amd-no-cuda-the-gguf-format-mismatch-that-wastes

topic#machine-learning

secondary2 topics

sentimentnegative

canonicaldiscuss.huggingface.co

navigation

← prevperso — a WebAssembly policy eng…

next →Can Seoul afford free bus rides …

── more in #machine-learning 4 stories · sorted by recency

github.com · 20 Jun · #machine-learning

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

setup-ia-local-rx580-vulkan.web.app · 19 Jun · #machine-learning

Running local AI on AMD RX 580 (2017 GPU) using Vulkan – no CUDA, no ROCm

vettedconsumer.com · 21 Jun · #machine-learning

Three RTX 3060s vs One RTX 3090 for Local AI: What a $1,500 Build Actually Measured

github.com · 21 Jun · #machine-learning

Show HN: Cc-fleet – run other LLMs as Claude Code workers, your sub drives

── more on @amd 3 stories trending now

wpnews · 20 Jun · #ai-safety

SR 11-7 Model Risk for AI Systems: What Banks Actually Need to Build

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

Microsoft is rewriting the economics of enterprise AI and the bill shock is just getting started

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required