Long ago, in a small town nestled between northern mountains, there was a place forever shrouded in heavy mist, called Bailan.
In Bailan lived a young painter named Wu Sheng — "Born of Mist."
He had a strange gift. Other painters first looked clearly at a thing, then drew it stroke by stroke. He worked the opposite way — he stared into a swirling mass of fog and slowly saw a painting emerge.
People thought it absurd. "How could there be a painting in mist?" Wu Sheng would only smile, and never explain.
A Library on the Sea One day the old town governor summoned him. "I want a painting of something that has never existed: a library floating on the sea, at dusk, with two moons in the sky."
The room burst into laughter. "No such place exists. How could anyone paint it?"
But Wu Sheng simply nodded. "I can."
He took a sheet of white paper — and instead of putting brush to it, he covered the whole sheet in chaotic dark grey paint, like a window after a snowstorm. Nothing was visible. The onlookers were even more puzzled. "You're ruining it."
Wu Sheng replied: "A real painting must first learn to hide."
For the days that followed, he did only one thing: erase a tiny bit of chaos at a time. Not all at once. Not in great strokes. Just a little. One day he uncovered a faint patch of light. The next, a stretch of coastline. The day after, the suggestion of bookshelves. Later, two moons floated up out of the haze. He seemed to be in negotiation with the fog itself. Not creating, but constantly asking: "What was supposed to be here?" When he erased wrong, he reconsidered. When something stayed unclear, he kept watching.
For forty-nine days. In the end, the paper truly held a library floating on the sea. The water was still. The book pages turned. Dusk hung in the sky like a golden breath. Two moons drifted in the distance.
Where Does the Mist Come From? The town was stunned. Someone asked, "How did you do it? You started with nothing."
Wu Sheng shook his head. "No — I started with everything. It was all just hidden inside the mist."
The governor pressed: "Then how did you know what to erase?"
Wu Sheng answered: "Because I first heard the names. Floating library. Two moons. Dusk. Sea. The words were like distant bells. I followed the sound through the fog and found the path."
Years later he took on an apprentice. The boy studied long but never grasped it. He kept thinking, I want to paint the result directly.
So Wu Sheng took him to the mountaintop. The morning fog rolled thick across the slopes.
"Do you see the tower?" Wu Sheng asked.
"No," said the apprentice.
"Then does it not exist?"
The apprentice fell silent.
Wu Sheng said: "Painting is the same. You don't create a world from nothing. You move, step by step, toward the most plausible world inside the chaos. Real painting isn't laying down brushstrokes. It's removing noise."
Years later, the people of Bailan still spoke of him. They said: he wasn't painting at all. He was teaching the world how to slowly grow order out of chaos.
The Real Idea: Diffusion Models This whole story maps onto the core principle behind modern AI image generation — the diffusion model.
Stable Diffusion, Midjourney, and the GPT Image 2 model that powers this site all rely heavily on this idea.
In one sentence:
The model doesn't paint from scratch. It starts with pure random noise and removes noise step by step, until an image emerges.
Just like Wu Sheng: cover the page in chaos first (pure noise), then erase a little at a time (gradual denoising), until the painting is revealed.
Training: Teaching the Model How to Denoise In training, the model learns this way.
Step 1: take a real image — say, a cat.
Step 2: add noise to it, repeatedly:
- Step 1: cat is still clear
- Step 100: starting to blur
- Step 500: barely visible
- Step 1000: pure random TV static
Step 3: train the model to answer: "If it looks this messy now, what did the original image probably look like?"
That is, learn the reverse process: from chaos → clarity.
That's the core of it.
Generation: Actually Painting an Image When generating an image for real, the model has no picture to start from. It only has a blob of random noise, and a prompt:
An orange cat wearing an astronaut helmet, sipping coffee on the moon.
So it begins:
- Step 1: small denoising
- Step 30: a cat silhouette appears
- Step 80: the helmet shows up
-
Step 150: the lunar background takes shape
-
Step 300: details settle in And the image is born.
Why Can Text Steer the Image? Because there's another key module: the Text Encoder.
It turns "orange cat + astronaut + moon + coffee" into a vector of numbers (a conditioning signal), and during each denoising step it keeps reminding the model:
- "Don't forget — orange cat, not black cat."
- "On the moon, not in a kitchen."
This is called Conditional Generation.
Why Did Diffusion Beat GANs? Earlier AI image generators relied on GANs (Generative Adversarial Networks). But GANs were notoriously unstable, prone to mode collapse, hard to train, and limited in diversity.
Diffusion is more stable, more controllable, higher quality, and scales better to large models. Which is why it has quietly become the default of the modern era.
The One-Sentence Truth AI image generation isn't "creation." It's:
Searching for the result that looks
most like an imageinside a probability space.
It's like asking, again and again, inside infinite chaos: "What's the most plausible next step here?"
That is the deepest idea in modern generative AI.