GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation

wpnews.pro

cd /news/generative-ai/gap3d-generative-alignment-of-vlm-la… · home › topics › generative-ai › article

[ARTICLE · art-17114] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=generative-ai verified=true sentiment=↑ positive

GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation

Researchers have developed GAP3D, a diffusion-based method that aligns vision-language model latents directly to patch-level image embeddings, enabling frozen generative models to use VLMs as prompt encoders without expensive end-to-end training. The approach, tested on 3D asset generation, trains primarily on general-domain image-text pairs and demonstrates emergent zero-shot capabilities for multimodal prompts despite text-only training. GAP3D represents an initial step toward modular integration of foundation models by partially bridging the representation gap between VLM and image-encoder feature spaces through generative alignment to dense embedding spaces.

read1 min views12 publishedMay 29, 2026

arXiv:2605.28995v1 Announce Type: new Abstract: Recent approaches integrating vision-language models (VLMs) as prompt encoders for generative model conditioning typically rely on expensive end-to-end training or map features to compressed representations, discarding the dense spatial structure required for geometry-aware tasks like 3D asset generation. To address this, we propose GAP3D, a modular, diffusion-based approach that aligns VLM-generated latents directly to the complete, patch-level feature space of a pre-trained image encoder, enabling a frozen downstream generative model to utilize a VLM as prompt encoder while maintaining a spatially structured conditioning signal. Evaluated on 3D asset generation, our method bypasses the need for large-scale 3D data by training mainly on general-domain image-text pairs. It also exhibits emergent zero-shot capabilities for multimodal prompts, despite being trained exclusively on text input. Finally, while currently prioritizing high-level semantics over fine-grained detail, GAP3D demonstrates that the representation gap between VLM and image-encoder feature spaces can be partially bridged through diffusion-based alignment, taking the first steps towards a modular integration of foundation models through generative alignment to dense embedding spaces.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/gap3d-generative-alignme…

Read original on arxiv.org → arxiv.org/abs/2605.28995

mentioned entities

GAP3D

VLM

arXiv

metadata

sluggap3d-generative-alignment-of-vlm-latents-to-patch-level-embeddings-for-3d

topic#generative-ai

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #generative-ai 4 stories · sorted by recency

arxiv.org · 13 Jul · #generative-ai

LieBN: Batch Normalization over Lie Groups

arxiv.org · 13 Jul · #generative-ai

Vision Transformers Learn Gestalt-Like Figure-Ground Cues from Natural Images

dev.to · 13 Jul · #generative-ai

Are We Chasing Ghosts with Deepfake Detection?

startupfortune.com · 13 Jul · #generative-ai

Amazon Bets On TwelveLabs To Turn Video Into The Next AI Battleground

── more on @gap3d 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required