# PID: Fast and High-Resolution Latent Decoding with Pixel Diffusion

> Source: <https://research.nvidia.com/labs/sil/projects/pid/>
> Published: 2026-05-25 15:23:18+00:00

# PiD:

Fast and High-Resolution Latent Decoding

with Pixel Diffusion

*
TL;DR:
PiD directly decodes latent representations into high-resolution images, replacing the decode–then–super-resolve cascade while achieving lower latency and higher visual quality.
*

## Abstract

Most practical high-resolution text-to-image systems rely on latent diffusion models, where generation is performed in a compact latent space and a decoder maps latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synthesize more details, and becomes increasingly costly at megapixel scale. This drawback calls for a more expressive and efficient decoding paradigm. Motivated by recent progress in scalable pixel-space diffusion, we introduce **PiD**, a **Pi** xel diffusion **D** ecoder that reformulates latent decoding as conditional pixel diffusion, unifying decoding and upsampling into one generative module. By denoising directly in high-resolution pixel space, **PiD** synthesizes 4× and even 8× upscaled images with low latency. For latent conditioning, a lightweight sigma-aware adapter injects noise-corrupted latents into the pixel diffusion backbone, enabling **PiD** to decode partially denoised latents and terminate the latent diffusion process early. To further improve efficiency, we distill the model using DMD2, reducing inference to just 4 steps. **PiD** applies to both conventional VAE latents and semantic latents (e.g., SigLIP, DINOv2) used in recent RAE-based models. **PiD** decodes latents of 512×512 images into 2048×2048 pixels in under 1 second with 13 GB peak memory on a consumer RTX 5090, and as fast as 210 ms on a GB200 GPU, about 6× faster than cascaded diffusion-based super-resolution pipelines with better visual fidelity.

## Results

### From Latent to Pixels

### 4K Decode

### Baseline Comparison

### Quantitative Results (Decoding + Upsampling, 512² → 2048²)

PiD is up to **5.9× faster** than SeedVR2 (211.2 ms vs 1237.5 ms)

% of evaluations where judges prefer **PiD** over each baseline
