{"slug": "cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager", "title": "Cosmos Claw: Hack on a Boat in SF (Nvidia Cosmos Based Social Media Manager)", "summary": "A team of developers built Cosmos Claw, an AI-powered social media manager that uses NVIDIA Cosmos 3 and GPT-4o to autonomously film, edit, and post promotional videos for venues, deployed on Nebius H200 GPUs during a San Francisco yacht hackathon.", "body_md": "**It shoots, directs, and edits social videos for your venue — on autopilot, in every format, without a film crew.**\n\n*Built for the Yacht Hackathon — by @ComposioHQ, @nebius, @tavily-ai & @openclaw.*\n\n** Watch the 60s demo on X** · or play it inline below 👇\n\n**Thirty seconds.** That's all a viewer gives your venue before they swipe. To\nwin those seconds you need a *constant* stream of video — but a human\nvideographer is expensive, slow, and shoots one thing at a time.\n\n**Cosmos Claw is the always-on, AI-native alternative: a videographer and a\nmarketing manager that never sleep.** Point it at any venue — a short-let, a café,\na bar — and it runs the whole studio on autopilot:\n\n**Studies** your space — GPT-4o (vision) labels every photo and learns what each room is.**Brands** it — invents the positioning and*locks in*the missing facts (price, story, amenities) so they stay identical across every video.**Ideates** like a manager — brainstorms a fresh campaign for each post (angle, hook, photo order, format, music, voice) that's different from everything it has shipped before.**Films** real motion — NVIDIA Cosmos 3, a**world model built for robotics**, doesn't pan over stills; it generates a first-person POV that physically*walks into the room*.**Voices & cuts** it — a unique GPT-written**voiceover** over a mood-matched music bed, cross-faded into the right aspect ratio.**Delivers everywhere**— ready-to-post cards (caption, hashtags, handle, recommended audio) in every feed: Reels/TikTok 9:16, IG 1:1 & 4:5, YouTube 16:9.\n\nNo crew, no call sheet, no edit bay. Just your existing photos in — a full, on-brand social calendar out, on repeat.\n\nAs you read this, two workers are filming **in parallel** — pumping out a stream\nof ready-to-post Reels and TikToks for two San Francisco venues at once, each with\nits own AI voiceover — all on a Cosmos 3 model **we deployed ourselves** on Nebius\nH200s. It pauses when the network blips and resumes on its own. Truly always-on.\n\nLooking forward to expanding this on the Yacht — SF is\n\nsooooamazing 🌉⛵️\n\n**1 — Raw photos in.** Drop a venue's existing images into the project. That's the\nonly input. *(Here: the Alamo Square Hacker House — bedrooms, gym, coworking.)*\n\n**2 — The marketing manager's memory.** An OpenClaw-style GPT-4o manager researches\nthe venue, locks in a consistent brand (positioning, audience, tone, pitch), writes\nthe voiceover, and picks the assets & order — the durable memory every video is\ngrounded on.\n\n**3 — Ready-to-post cuts out.** The Agent Loop streams everything the videographer\ndoes in real time, and each published cut is a ready-to-post package: video +\ncaption + recommended audio (music & voice) + handle, ready to download or push to\nthe channel.\n\n| Layer | What we used |\n|---|---|\n🎥 Video model |\nNVIDIA Cosmos 3 Nano — a world model (built for robotics/embodied POV), self-deployed by us for first-person walk-throughs |\n⚡ Compute |\n— NVIDIA®\nH200 NVLink GPUs |\n🧠 Manager + director |\nGPT-4o (vision) — studies the photos, brands the venue, ideates each campaign & storyboard |\n🔎 Neighborhood research |\n— enriches each venue with real local context\n|\n🗺️ Maps & info cards |\nOpenStreetMap — location, transit & nearby spots |\n🔊 Audio |\nOpenAI TTS — a unique per-cut voiceover over a mood-matched music bed |\n🧩 App |\nFastAPI Studio UI + an always-on `marketing_loop` driver, FFmpeg for cutting/transitions |\n\nWe didn't just call a hosted API — we **stood up Cosmos 3 Nano ourselves** on\nNebius H200 NVLink GPUs (vLLM-Omni, OpenAI-compatible) and drove it end-to-end.\nTavily researches the surrounding neighborhood so every second of the video\ncarries the context a viewer needs to say *yes*.\n\nShout-out to the partners: @ship_builders · @nebiusai · @nvidia · @composio · @tavilyai · @openclaw\n\n```\nvenue photos + facts\n        │\n        ▼\n  GPT-4o manager  ──→  brand dossier (positioning + durable assumptions)\n        │                     │\n        │                     ├─ Tavily ─→ neighborhood research\n        │                     └─ ideate ─→ one fresh campaign (angle · photos ·\n        │                                   format · music · voice · caption · VO)\n        ▼\n  NVIDIA Cosmos 3 Nano  ──→  a short first-person POV clip per beat\n   (world model, self-hosted on Nebius H200)\n        │\n        ▼\n  transitions + audio  ──→  cross-fade · GPT voiceover · mood music · reframe\n        │\n        ▼\n  ready-to-post cut.mp4  ──→  Agent Loop feed (caption · hashtags · audio)\n        │\n        └──────────────  loop: next idea, next venue (in parallel)\n```\n\n| File | Role |\n|---|---|\n`scripts/marketing_loop.py` |\nThe always-on loop: study → ideate → film → voice → publish, per venue (parallel-safe) |\n`scripts/cosmos_montage.py` |\nTerminal montage: GPT vision per photo → Cosmos clips → fast transitions |\n`app/marketing_agent.py` |\nGPT-4o marketing manager: research → brand → brief |\n`app/brand.py` |\nPer-venue brand dossier (memory, durable assumptions, social posts) |\n`app/main.py` |\nFastAPI server + Studio UI + generation API |\n`app/trailer.py` |\nGPT-4o \"director\" — storyboard, shot/motion + walk-through mode |\n`app/generation/cosmos.py` |\nNVIDIA Cosmos 3 image→video adapter (motion → flow-shift) |\n`app/generation/stub.py` |\nFree local FFmpeg fallback generator |\n`app/transitions.py` |\nFast cross-fade montage into any aspect ratio (xfade) |\n`app/curation.py` |\nBest-of-N take scoring (motion energy + stability) |\n`app/audio.py` |\nTTS voiceover + mood music bed + duck-and-mux |\n`app/infocards.py` |\nMap / price / neighborhood cards (OpenStreetMap) |\n`app/pipeline.py` |\nOrchestrates a single run (best-of-N, info beats, finish) |\n`app/agent.py` |\nTerminal CLI to drive the manager + fire renders |\n`deploy/tunnel_keeper.sh` |\nSelf-healing SSH tunnel to the Nebius GPU |\n\nCosmos Claw isn't a one-shot tool — it's a **loop**. A persistent **brand dossier**\n(`outputs/listing_{id}_brand.json`\n\n) is the single source of truth per venue, and an\nautonomous manager works against it the way a real social-media manager would —\nforever:\n\n**Study**— GPT-4o vision builds an*asset index*: what every uploaded photo is (cached, so it's paid for once).**Ideate**— brainstorms ONE fresh campaign*distinct from past themes*: the angle, which photos to use (ordered like a story), the social format, music mood, TTS voice, a ready-to-post caption + hashtags, and a ~25s**voiceover script**.** Film**— turns the chosen photos into short, first-person Cosmos clips.** Cut**— cross-fades them into the campaign's aspect ratio and mixes the GPT voiceover over a mood-matched music bed.** Publish**— drops a ready-to-post card into the** Agent Loop**feed and logs every step to the dossier timeline.\n\n…then it does it again, with a brand-new idea. Run **one worker per venue** and\nthey generate **in parallel**, so multiple feeds fill at once:\n\n```\n# one always-on worker per project, running concurrently\npython scripts/marketing_loop.py --projects la-house-1   --tag la --max-videos 6\npython scripts/marketing_loop.py --projects hacker-house --tag hh --max-videos 6\n```\n\nIt's built to run unattended: a **live endpoint probe before every shot** means a\nWi-Fi/tunnel blip just *pauses* the shoot and resumes when the connection is back —\nno babysitting, no half-burned campaigns. A self-healing SSH tunnel keeper\n(`deploy/tunnel_keeper.sh`\n\n) keeps the link to the GPU alive underneath.\n\n**Consistency is the trick.** Whatever the manager makes up, it makes up *once*:\n`build_brand`\n\nwrites the missing facts as **durable assumptions that are never\noverwritten**, so price, amenities and host story stay identical across every cut.\n\n```\nstudy (vision asset index) ─→ ideate (fresh campaign) ─→ film ─→ voice + cut ─→ publish ─┐\n        ▲                                                                                 │\n        └───────────────────────  grounded on the brand dossier  ◀──────────────────────┘\n```\n\nPrefer to drive it by hand? The same brain runs from the **Agent Loop** tab in the\nUI, or from the terminal:\n\n```\npython -m app.agent list                                  # projects + dossier status\npython -m app.agent run la-house-1 --format reel          # research → brand → brief\npython -m app.agent assume la-house-1 price \"$245/night\"  # lock a consistent fact\npython -m app.agent generate la-house-1 --format youtube  # render via the live API\n```\n\nFormats: `reel`\n\n, `tiktok`\n\n, `shorts`\n\n, `story`\n\n, `snap`\n\n(9:16), `youtube`\n\n(16:9),\n`square`\n\n(1:1), `portrait`\n\n(4:5). The render canvas switches automatically.\n\nRequires Python 3.9+ and FFmpeg.\n\n```\ncd LiveHere\n\nbrew install ffmpeg                 # one time\n\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n\ncp .env.example .env                # add your keys (OpenAI, Tavily, Cosmos)\n\npython -m app                       # → http://127.0.0.1:8000\n```\n\nOpen [http://127.0.0.1:8000](http://127.0.0.1:8000), pick a listing, tweak the auto-filled details, and\nhit **Generate**. With no GPU configured it runs on the free local FFmpeg stub;\npoint it at Cosmos for the real thing (below).\n\nThe generation backend is swapped purely via env vars — **no code change** to the\nUI or pipeline.\n\n```\n# .env\nLIVEHERE_BACKEND=cosmos\nCOSMOS_API_STYLE=vllm_omni\nCOSMOS_BASE_URL=http://<your-gpu-host>:8000/v1\nCOSMOS_API_KEY=...\n```\n\nWe self-hosted it on a **Nebius H200 NVLink** instance with vLLM-Omni:\n\n```\nvllm serve nvidia/Cosmos3-Nano --omni --host 0.0.0.0 --port 8000 --no-guardrails\n```\n\nFull deploy walkthrough (Nebius / Modal / RunPod) is in\n[ deploy/DEPLOY.md](/manas15/cosmos-claw/blob/main/deploy/DEPLOY.md). Cosmos can't run on Apple Silicon — keep\nthe GPU instance up only while generating, and tear it down when idle.\n\nCosmos Claw · made with ☕ for the Yacht Hackathon · Composio × Nebius × Tavily", "url": "https://wpnews.pro/news/cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager", "canonical_source": "https://github.com/manas15/cosmos-claw", "published_at": "2026-06-14 23:57:59+00:00", "updated_at": "2026-06-15 00:12:25.922940+00:00", "lang": "en", "topics": ["generative-ai", "computer-vision", "ai-agents", "ai-tools", "ai-infrastructure"], "entities": ["NVIDIA", "Cosmos Claw", "GPT-4o", "Nebius", "Composio", "Tavily", "OpenClaw", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager", "markdown": "https://wpnews.pro/news/cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager.md", "text": "https://wpnews.pro/news/cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager.txt", "jsonld": "https://wpnews.pro/news/cosmos-claw-hack-on-a-boat-in-sf-nvidia-cosmos-based-social-media-manager.jsonld"}}