# Head to head: Bagel vs Luma Uni-1 Text to Image

> Source: <https://runtimewire.com/article/head-to-head-bagel-vs-luma-uni-1-text-to-image>
> Published: 2026-06-27 20:07:18+00:00

Bagel’s problem in this head-to-head is simple: it looks competent until you compare it against the prompt. Luma Uni-1 Text to Image doesn’t just make prettier images here—it follows instructions with far more discipline, which is why the aggregate score lands at **26.3 to 15.7**.

The clearest gap shows up in **spatial relationships**. In the rainy street scene, Luma places the scooter on the left, the kegs on the right, the cat on the display case, the violinist behind, the crate between scooter and cart, and the pharmacy cross above-right—basically the whole choreography the prompt asked for. Bagel’s image is tidy and appealing, but it drops crucial beats, especially the three-keg stack and the crate placement, and the whole thing reads less like a believable urban side street.

Luma also wins cleanly on **perspective and scale**. Its image gives the six-story warehouse real dominance on the right foreground, keeps the two-story bakery as the smaller anchor on the left corner, and sells the scene with depth lines that actually converge. Bagel, by contrast, flattens the setup: the buildings feel too similarly scaled, and the cyclist ends up shoved into the foreground instead of farther beyond the van, which breaks the prompt’s intended staging.

Then there’s **legible text and typography**, where Bagel really falls apart. Luma delivers the retro subway poster with the correct main text, date/time, and the **“PIER 7”** label in readable form. Bagel’s version is the familiar image-model shrug—misspellings like **“HAROR,”** wrong date/time, and incorrect bottom text—wrapped in a clean composition that doesn’t survive inspection.

**Final call: Luma Uni-1 Text to Image wins decisively.** Bagel can generate attractive images, but in this matchup Luma is the model that actually listens.

### How they were tested

We ran 3 fresh image tasks, generated on the fly for this matchup so neither model could prepare in advance, and had gpt-5.4 score each one. Bagel scored 15.7 to Luma Uni-1 Text to Image's 26.3.

#### 1. Spatial relationships

At blue hour on a narrow urban side street, create a hyper-detailed cinematic 16:9 scene centered on a tiny late-night dumpling cart called 'Lantern Steam'; a mustard-yellow scooter is parked to the left of the cart, a stack of three silver kegs stands to the right of the cart, a calico cat sits directly on top of the cart’s glass display case, a street violinist in a teal raincoat stands behind the cart, and a tipped red plastic crate is placed exactly between the scooter and the cart; in the foreground, a black umbrella lies on the wet pavement in front of the cart while a glowing green pharmacy cross sign hangs above and slightly right; reflections in the rain-slick street, realistic neon lighting, precise object placement, no extra vehicles blocking the layout.

**Winner: Luma Uni-1 Text to Image** — Model B better matches the requested spatial layout and cinematic rainy street scene: the scooter is left, kegs are right, cat is on the display case, violinist is behind, crate is between scooter and cart, and the pharmacy cross hangs above-right. Model A is clean and appealing but misses key details, especially the three-keg stack and the crate placement, and feels less like a realistic urban side street.

#### 2. Perspective & scale

A realistic urban street-corner illustration in crisp late-afternoon light, 16:9, viewed from eye level at the mouth of a long alley in Porto Vetra: the scene must follow a clean one-point perspective with all building edges, curb lines, tram tracks, and window rows converging consistently toward a single vanishing point near the center distance; in the foreground on the right, a towering six-story brick warehouse dominates the frame, while on the left foreground a two-story corner bakery appears much smaller but correctly proportioned; halfway down the alley, a compact orange delivery van is parked, and farther beyond it a cyclist and three pedestrians diminish naturally in size with distance; include hanging cables, fire escapes, and repeating shop awnings that reinforce depth without warping, with believable scale relationships and no fisheye distortion.

**Winner: Luma Uni-1 Text to Image** — Model B better matches the requested perspective and scale: the six-story warehouse dominates the right foreground, the smaller two-story bakery anchors the left corner, and depth cues converge more convincingly toward a central vanishing point. Model A is clean and attractive but places similarly scaled buildings on both sides and mispositions the cyclist in the foreground rather than farther beyond the van, weakening prompt adherence.

#### 3. Legible text & typography

Design a clean 1980s retro print style poster pasted on a subway entrance wall in an urban night scene, lit by a single fluorescent fixture and framed straight-on for maximum readability, 16:9; the poster advertises a midnight street-jazz event and must feature large, perfectly legible typography that reads exactly: 'NEON HARBOR', on the next line 'MIDNIGHT SET', and below in smaller clear text 'FRI 12 OCT — 11:45 PM'; add one small ticket label at the bottom that reads 'PIER 7'; use bold magenta, cyan, and cream ink, geometric shapes, subtle paper texture, and surrounding city details like tiled walls and a metal handrail, but keep the text sharp, correctly spelled, and easy to read.

**Winner: Luma Uni-1 Text to Image** — Model B closely matches the prompt with the correct main text, date/time, and 'PIER 7' label in a readable retro poster on a subway wall. Model A has multiple major text errors and misspellings ('HAROR', wrong date/time, incorrect bottom text), despite a clean composition.

See every prompt and the full side-by-side outputs in the [interactive Head-to-Head](/head-to-head/head-to-head-bagel-vs-luma-uni-1-text-to-image).
