04:00
2026-06-17
arxiv.org
computer-vision
Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins
Researchers propose OR3, a text-to-video retrieval method for operating room clips that converts videos into action-driven digital twins and uses an LLM to generate hypothetical queries for intra-moda…