{"slug": "hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution", "title": "HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution", "summary": "Benchmarking the HiDream-O1-Image model to find optimal speed-quality trade-offs for iterative UI-based image generation. The author found that using 1536x1536 resolution with 28–36 steps and a guidance scale of 5.0 provides a good balance, reducing generation time from 33 seconds to roughly 10 seconds. The recommended workflow is to explore ideas quickly at low resolution (1024x1024) and low steps (24), then re-render promising results at higher quality settings.", "body_md": "## TL;DR\n\nI'm running HiDream-O1-Image Full as a persistent local server integrated into a Studio UI. The official recipe — `2048x2048 / 50 steps / guidance 5.0`\n\n— produces beautiful results, but each image takes around 33 seconds. That's too slow for iterative exploration.\n\nSo I held the prompt and seed constant and swept `steps`\n\n, `guidance`\n\n, and resolution. The sweet spots were clear.\n\n| Config | Time | vs. Official |\n|---|---|---|\n`2048 / 50 steps / g5` |\n33.37s | 1.00x |\n`2048 / 28 steps / g5` |\n18.41s | 1.81x |\n`1536 / 20 steps / g5` |\n7.14s | 4.67x |\n`1024 / 20 steps / g5` |\n3.83s | 8.71x |\n\nThe takeaway: **explore direction at low resolution and low steps, then do the final render at full quality.** In particular, `1536x1536 / 28–36 steps`\n\nhits a very good speed-quality balance.\n\n## Motivation\n\nOnce image generation is embedded in a UI, iteration speed matters more than peak quality.\n\nThe real workflow isn't \"generate one perfect image.\" It looks like this:\n\n- Check composition, mood, outfit, background direction\n- Tweak the prompt slightly\n- Try different seeds\n- Re-render only the promising candidates at full quality\n\nWaiting 30+ seconds per generation makes that loop painful. Being able to see rough candidates in 5–10 seconds is a completely different experience.\n\nThe goal here isn't \"the best single image\" — it's**understanding how far you can cut exploration cost without breaking quality in a meaningful way**.\n\n## Environment\n\n-**GPU**: NVIDIA RTX PRO 6000 Blackwell Max-Q (96 GB VRAM) -** Model**: HiDream-O1-Image Full (8B, bf16) -** Inference server**: Custom Python HTTP server with model kept resident -** Measured**: One`/generate/t2i`\n\nrequest after model load -**Seed**:`42`\n\n-**Prompt**:\n\n```\nA cinematic portrait photo of a woman in a rainy neon street,\ndetailed skin, 85mm lens, realistic lighting, high detail\n```\n\nAll comparison images use the same prompt and seed. Only `steps`\n\n, `guidance_scale`\n\n, resolution, and resolution snapping are varied.\n\n| Parameter | Value |\n|---|---|\n| prompt | `A cinematic portrait photo of a woman in a rainy neon street, detailed skin, 85mm lens, realistic lighting, high detail` |\n| seed | `42` |\n| mode | `t2i` |\n| dtype | `bf16` |\n| negative prompt | none |\n| sampler / scheduler | HiDream pipeline default |\n\nI used a portrait because hair, skin, background light, and fine detail are easy to compare. That said, a young woman's face has relatively little texture and wrinkle detail to begin with, so it's actually a forgiving subject for low-step generation — I'll come back to that.\n\nImages in this article are contact sheets with results side by side. Pixel-peeping is easier at full resolution, but for UI-driven exploration the first question is \"does this look worth keeping?\" — so I've prioritized at-a-glance comparison here.\n\n## Start by Reducing Steps\n\nFixed `guidance=5.0`\n\nand `2048x2048`\n\n, varied only steps.\n\n| Resolution | Steps | Guidance | Elapsed | Speedup vs 50 steps |\n|---|---|---|---|---|\n| 2048x2048 | 20 | 5.0 | 13.070s | 2.55x |\n| 2048x2048 | 28 | 5.0 | 18.412s | 1.81x |\n| 2048x2048 | 36 | 5.0 | 23.854s | 1.40x |\n| 2048x2048 | 50 | 5.0 | 33.370s | 1.00x |\n\nPretty much theoretical scaling. In this HiDream path, when `guidance > 1.0`\n\n, both conditional and unconditional forwards run, so reducing steps translates directly to lower latency.\n\nVisually: 20 steps shows some roughness. 28 steps looks fine at first glance, though fine detail thins out under comparison. 36 steps holds up well for most use cases.\n\n## guidance=1.0 Is Significantly Faster\n\nNext I varied guidance as well, comparing practical preset candidates.\n\n| Preset | Resolution | Steps | Guidance | CFG | Elapsed |\n|---|---|---|---|---|---|\n| Draft | 2048x2048 | 24 | 1.0 | off | 8.164s |\n| Balanced | 2048x2048 | 36 | 3.0 | on | 23.664s |\n| Official | 2048x2048 | 50 | 5.0 | on | 32.609s |\n\n`guidance=1.0`\n\neffectively disables CFG, so it's faster than step count alone would suggest — 24 steps lands in the 8-second range.\n\nThe trade-off is that lower guidance changes prompt adherence and overall aesthetics. Fine for idea validation, but for prompts involving text, specific clothing details, or precise multi-element placement, staying at `guidance=3–5`\n\nis safer.\n\n## The Resolution Trap: Requesting 1024 Doesn't Make It Faster\n\nMy first instinct was to just pass `width=1024, height=1024`\n\nand get a faster result. But the official pipeline doesn't use the requested resolution directly — it snaps to the nearest fixed aspect-ratio bucket.\n\nMeasured results:\n\n| Requested | Actual |\n|---|---|\n| 512x512 | 2048x2048 |\n| 1024x1024 | 2048x2048 |\n| 2048x2048 | 2048x2048 |\n| 1280x720 | 2560x1440 |\n| 720x1280 | 1440x2560 |\n| 1024x768 | 2304x1728 |\n\nSending `1024x1024`\n\nfrom the UI does nothing — square aspect ratios all resolve to `2048x2048`\n\n. The snapping logic lives in `models/utils.py`\n\nunder `PREDEFINED_RESOLUTIONS`\n\n, and it seems intentionally designed to favor output stability.\n\n## Bypassing Buckets for True Low-Resolution Generation\n\nFor experimentation I added a `snap_resolution=false`\n\nflag that bypasses the pipeline's resolution snapping. For safety, arbitrary resolutions are constrained to:\n\n- width and height aligned to 32px\n- 256px minimum\n- max 4.3MP total\n\nComparing `1024 / 1536 / 2048`\n\nat `20 steps / guidance=5.0`\n\n:\n\n| Resolution | Elapsed | Speedup vs 2048 |\n|---|---|---|\n| 1024x1024 | 3.831s | 3.47x |\n| 1536x1536 | 7.139s | 1.86x |\n| 2048x2048 | 13.278s | 1.00x |\n\nThis is where the real gains are. Given that the official 2048 recipe sits at 30+ seconds, `1536 + 28 steps`\n\nshould land around 10 seconds — a completely different feel.\n\n1024 is fast but noticeably lower in information density. Good for directional checks, but probably too rough for regular output use.\n\n## Presets in the Studio UI\n\nBased on these results, here's what I settled on in the Studio UI:\n\n| Use case | Resolution | Steps | Guidance | When to use |\n|---|---|---|---|---|\n| Quick preview | 1024x1024 | 20–24 | 1.0–3.0 | Composition / mood check |\n| Standard | 1536x1536 | 28–36 | 3.0–5.0 | Day-to-day |\n| High quality | 2048x2048 | 36–50 | 5.0 | Re-render of selected candidates |\n| Official bucket | bucket | 50 | 5.0 | Match upstream recipe exactly |\n\nSteps and resolution are independently selectable in the UI. The workflow is: explore with `1024 / 24 steps`\n\n, then re-render promising results at `1536`\n\nor `2048`\n\nwith the same prompt and seed.\n\n## Cases Where Quality Degradation Shows Up\n\nWith this portrait, the difference between 28 steps and 50 steps was \"visible under comparison\" — not obvious at a glance. But part of that is the subject matter.\n\nLow steps and low resolution tend to hurt most with:\n\n- Older faces, wrinkles, skin texture\n- Hands, fingers, jewelry\n- Fabric with fine patterns\n- Text in signs or books\n- Multiple people\n- Busy indoor scenes with lots of background objects\n\nConversely, young faces, simple backgrounds, and soft lighting are forgiving — low-cost settings hold up well.\n\nThat's why a single fixed preset isn't the right design.**Giving users control over exploration cost depending on what they're generating** is the better approach.\n\n## Reproduction Commands\n\nThe benchmark script lives at `image_server/bench_quality_speed.py`\n\n. It calls the HTTP API after the model is already resident, so model load time is excluded from all measurements.\n\n```\n./image_server/start_image_server.sh\n```\n\nSteps comparison:\n\n```\npython3 image_server/bench_quality_speed.py \\\n  --prompt \"A cinematic portrait photo of a woman in a rainy neon street, detailed skin, 85mm lens, realistic lighting, high detail\" \\\n  --seed 42 \\\n  --variant s20_g5,20,5 \\\n  --variant s28_g5,28,5 \\\n  --variant s36_g5,36,5 \\\n  --variant s50_g5,50,5\n```\n\nResolution comparison:\n\n```\npython3 image_server/bench_quality_speed.py \\\n  --prompt \"A cinematic portrait photo of a woman in a rainy neon street, detailed skin, 85mm lens, realistic lighting, high detail\" \\\n  --seed 42 \\\n  --variant s20_g5,20,5 \\\n  --size 1024x1024 \\\n  --size 1536x1536 \\\n  --size 2048x2048 \\\n  --no-snap-resolution\n```\n\n## Summary\n\nHiDream-O1-Image Full is excellent at its official settings but too slow for iterative use. When you break down steps, CFG, and resolution separately, the speedups are clean and predictable.\n\n- Steps scale almost linearly with time\n-\n`guidance=1.0`\n\ndrops CFG and gives a large speed boost - The official pipeline snaps resolutions to fixed buckets\n- True low-resolution generation at 1024/1536 is dramatically faster\n-\n`1536 / 28–36 steps`\n\nis the practical sweet spot\n\nFor image generation UIs,**low-cost exploration → high-quality final render** is a much better flow than starting at maximum quality every time. This experiment gave me a solid basis for building exactly that.", "url": "https://wpnews.pro/news/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution", "canonical_source": "https://dev.to/shinji_shimizu_bb51276a5e/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution-4ejd", "published_at": "2026-05-22 11:23:03+00:00", "updated_at": "2026-05-22 11:36:06.419536+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "research", "products", "developer-tools"], "entities": ["HiDream-O1-Image"], "alternates": {"html": "https://wpnews.pro/news/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution", "markdown": "https://wpnews.pro/news/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution.md", "text": "https://wpnews.pro/news/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution.txt", "jsonld": "https://wpnews.pro/news/hidream-o1-image-3-8x-faster-benchmarking-steps-cfg-and-resolution.jsonld"}}