Geometry-Consistent Endoscopic Representations for Image-Guided Navigation via Structured Foundation Model Adaptation

wpnews.pro

cd /news/computer-vision/geometry-consistent-endoscopic-repre… · home › topics › computer-vision › article

[ARTICLE · art-30515] src=arxiv.org ↗ pub=2026-06-17T04:00Z topic=computer-vision verified=true sentiment=↑ positive

Geometry-Consistent Endoscopic Representations for Image-Guided Navigation via Structured Foundation Model Adaptation

Researchers propose a unified framework for learning geometry-consistent and domain-robust image representations in monocular endoscopy, combining a synthetic data pipeline with Hierarchy-Aware Geometry-Semantic Adaptation to improve pose estimation, depth prediction, and image-to-anatomy alignment. The method outperforms existing approaches on public and proprietary datasets, demonstrating strong synthetic-to-real transfer for clinical navigation tasks.

read1 min views1 publishedJun 17, 2026

arXiv:2606.17340v1 Announce Type: new Abstract: Accurate vision-based navigation in monocular endoscopy is difficult due to limited depth cues, weak tissue texture, non-rigid deformation, and substantial appearance variation across domains, all of which complicate pose estimation, depth prediction, and image-to-anatomy alignment. Although recent vision foundation models have shown promise, their learned representations often remain insufficiently geometry-consistent, hindering stable feature correspondence and limiting their reliability for downstream navigation tasks. We propose a unified framework for learning geometry-consistent and domain-robust image representations for monocular endoscopy. The framework combines a synthetic data pipeline that provides accurate geometric supervision with Hierarchy-Aware Geometry-Semantic Adaptation, a structured alternative to standard LoRA that inserts low-rank adapters selectively across the transformer hierarchy and couples them with layer-wise training objectives to encourage geometric correspondence in intermediate features and semantic consistency in deeper features. Experiments on public and proprietary datasets show improved geometric and semantic representation quality, leading to better performance on downstream navigation tasks including pose estimation and monocular depth estimation. The learned representations show favorable synthetic-to-real transfer on clinical bronchoscopy and provide a useful initialization for adaptation to sinus endoscopy and colonoscopy under limited supervision. The framework also shows favorable scaling with model size and training data. These results support hierarchy-aware, geometry-guided adaptation as a practical approach for endoscopic representation learning.

source & further reading

arxiv.org — original article

── more in #computer-vision 4 stories · sorted by recency

letsdatascience.com · 17 Jun · #computer-vision

China Extends National Medical Imaging AI Contest to ASEAN

arxiv.org · 17 Jun · #computer-vision

Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow

arxiv.org · 17 Jun · #computer-vision

Quantum Enchanced Multi-Scale CNN with Bi-directional Mamba for Crop Field Analysis

arxiv.org · 17 Jun · #computer-vision

Landsat-Sentinel-2 Algal Bloom Mapping Using Vision Transformers: Model Description, Implementation, and Examples

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required