cd /news/generative-ai/cosmos-claw-hack-on-a-boat-in-sf-nviโ€ฆ ยท home โ€บ topics โ€บ generative-ai โ€บ article
[ARTICLE ยท art-27367] src=github.com โ†— pub= topic=generative-ai verified=true sentiment=โ†‘ positive

Cosmos Claw: Hack on a Boat in SF (Nvidia Cosmos Based Social Media Manager)

A team of developers built Cosmos Claw, an AI-powered social media manager that uses NVIDIA Cosmos 3 and GPT-4o to autonomously film, edit, and post promotional videos for venues, deployed on Nebius H200 GPUs during a San Francisco yacht hackathon.

read7 min publishedJun 14, 2026

It shoots, directs, and edits social videos for your venue โ€” on autopilot, in every format, without a film crew.

Built for the Yacht Hackathon โ€” by @ComposioHQ, @nebius, @tavily-ai & @openclaw.

** Watch the 60s demo on X** ยท or play it inline below ๐Ÿ‘‡

Thirty seconds. That's all a viewer gives your venue before they swipe. To win those seconds you need a constant stream of video โ€” but a human videographer is expensive, slow, and shoots one thing at a time.

Cosmos Claw is the always-on, AI-native alternative: a videographer and a marketing manager that never sleep. Point it at any venue โ€” a short-let, a cafรฉ, a bar โ€” and it runs the whole studio on autopilot:

Studies your space โ€” GPT-4o (vision) labels every photo and learns what each room is.Brands it โ€” invents the positioning andlocks inthe missing facts (price, story, amenities) so they stay identical across every video.Ideates like a manager โ€” brainstorms a fresh campaign for each post (angle, hook, photo order, format, music, voice) that's different from everything it has shipped before.Films real motion โ€” NVIDIA Cosmos 3, aworld model built for robotics, doesn't pan over stills; it generates a first-person POV that physicallywalks into the room.Voices & cuts it โ€” a unique GPT-writtenvoiceover over a mood-matched music bed, cross-faded into the right aspect ratio.Delivers everywhereโ€” ready-to-post cards (caption, hashtags, handle, recommended audio) in every feed: Reels/TikTok 9:16, IG 1:1 & 4:5, YouTube 16:9.

No crew, no call sheet, no edit bay. Just your existing photos in โ€” a full, on-brand social calendar out, on repeat.

As you read this, two workers are filming in parallel โ€” pumping out a stream of ready-to-post Reels and TikToks for two San Francisco venues at once, each with its own AI voiceover โ€” all on a Cosmos 3 model we deployed ourselves on Nebius H200s. It s when the network blips and resumes on its own. Truly always-on.

Looking forward to expanding this on the Yacht โ€” SF is

sooooamazing ๐ŸŒ‰โ›ต๏ธ

1 โ€” Raw photos in. Drop a venue's existing images into the project. That's the only input. (Here: the Alamo Square Hacker House โ€” bedrooms, gym, coworking.)

2 โ€” The marketing manager's memory. An OpenClaw-style GPT-4o manager researches the venue, locks in a consistent brand (positioning, audience, tone, pitch), writes the voiceover, and picks the assets & order โ€” the durable memory every video is grounded on.

3 โ€” Ready-to-post cuts out. The Agent Loop streams everything the videographer does in real time, and each published cut is a ready-to-post package: video + caption + recommended audio (music & voice) + handle, ready to download or push to the channel.

Layer What we used
๐ŸŽฅ Video model
NVIDIA Cosmos 3 Nano โ€” a world model (built for robotics/embodied POV), self-deployed by us for first-person walk-throughs
โšก Compute
โ€” NVIDIAยฎ
H200 NVLink GPUs
๐Ÿง  Manager + director
GPT-4o (vision) โ€” studies the photos, brands the venue, ideates each campaign & storyboard
๐Ÿ”Ž Neighborhood research
โ€” enriches each venue with real local context
๐Ÿ—บ๏ธ Maps & info cards
OpenStreetMap โ€” location, transit & nearby spots
๐Ÿ”Š Audio
OpenAI TTS โ€” a unique per-cut voiceover over a mood-matched music bed
๐Ÿงฉ App
FastAPI Studio UI + an always-on marketing_loop driver, FFmpeg for cutting/transitions

We didn't just call a hosted API โ€” we stood up Cosmos 3 Nano ourselves on Nebius H200 NVLink GPUs (vLLM-Omni, OpenAI-compatible) and drove it end-to-end. Tavily researches the surrounding neighborhood so every second of the video carries the context a viewer needs to say yes.

Shout-out to the partners: @ship_builders ยท @nebiusai ยท @nvidia ยท @composio ยท @tavilyai ยท @openclaw

venue photos + facts
        โ”‚
        โ–ผ
  GPT-4o manager  โ”€โ”€โ†’  brand dossier (positioning + durable assumptions)
        โ”‚                     โ”‚
        โ”‚                     โ”œโ”€ Tavily โ”€โ†’ neighborhood research
        โ”‚                     โ””โ”€ ideate โ”€โ†’ one fresh campaign (angle ยท photos ยท
        โ”‚                                   format ยท music ยท voice ยท caption ยท VO)
        โ–ผ
  NVIDIA Cosmos 3 Nano  โ”€โ”€โ†’  a short first-person POV clip per beat
   (world model, self-hosted on Nebius H200)
        โ”‚
        โ–ผ
  transitions + audio  โ”€โ”€โ†’  cross-fade ยท GPT voiceover ยท mood music ยท reframe
        โ”‚
        โ–ผ
  ready-to-post cut.mp4  โ”€โ”€โ†’  Agent Loop feed (caption ยท hashtags ยท audio)
        โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  loop: next idea, next venue (in parallel)
File Role
scripts/marketing_loop.py
The always-on loop: study โ†’ ideate โ†’ film โ†’ voice โ†’ publish, per venue (parallel-safe)
scripts/cosmos_montage.py
Terminal montage: GPT vision per photo โ†’ Cosmos clips โ†’ fast transitions
app/marketing_agent.py
GPT-4o marketing manager: research โ†’ brand โ†’ brief
app/brand.py
Per-venue brand dossier (memory, durable assumptions, social posts)
app/main.py
FastAPI server + Studio UI + generation API
app/trailer.py
GPT-4o "director" โ€” storyboard, shot/motion + walk-through mode
app/generation/cosmos.py
NVIDIA Cosmos 3 imageโ†’video adapter (motion โ†’ flow-shift)
app/generation/stub.py
Free local FFmpeg fallback generator
app/transitions.py
Fast cross-fade montage into any aspect ratio (xfade)
app/curation.py
Best-of-N take scoring (motion energy + stability)
app/audio.py
TTS voiceover + mood music bed + duck-and-mux
app/infocards.py
Map / price / neighborhood cards (OpenStreetMap)
app/pipeline.py
Orchestrates a single run (best-of-N, info beats, finish)
app/agent.py
Terminal CLI to drive the manager + fire renders
deploy/tunnel_keeper.sh
Self-healing SSH tunnel to the Nebius GPU

Cosmos Claw isn't a one-shot tool โ€” it's a loop. A persistent brand dossier (outputs/listing_{id}_brand.json

) is the single source of truth per venue, and an autonomous manager works against it the way a real social-media manager would โ€” forever:

Studyโ€” GPT-4o vision builds anasset index: what every uploaded photo is (cached, so it's paid for once).Ideateโ€” brainstorms ONE fresh campaigndistinct from past themes: the angle, which photos to use (ordered like a story), the social format, music mood, TTS voice, a ready-to-post caption + hashtags, and a ~25svoiceover script.** Film**โ€” turns the chosen photos into short, first-person Cosmos clips.** Cut**โ€” cross-fades them into the campaign's aspect ratio and mixes the GPT voiceover over a mood-matched music bed.** Publish**โ€” drops a ready-to-post card into the** Agent Loop**feed and logs every step to the dossier timeline.

โ€ฆthen it does it again, with a brand-new idea. Run one worker per venue and they generate in parallel, so multiple feeds fill at once:

python scripts/marketing_loop.py --projects la-house-1   --tag la --max-videos 6
python scripts/marketing_loop.py --projects hacker-house --tag hh --max-videos 6

It's built to run unattended: a live endpoint probe before every shot means a Wi-Fi/tunnel blip just s the shoot and resumes when the connection is back โ€” no babysitting, no half-burned campaigns. A self-healing SSH tunnel keeper (deploy/tunnel_keeper.sh

) keeps the link to the GPU alive underneath.

Consistency is the trick. Whatever the manager makes up, it makes up once: build_brand

writes the missing facts as durable assumptions that are never overwritten, so price, amenities and host story stay identical across every cut.

study (vision asset index) โ”€โ†’ ideate (fresh campaign) โ”€โ†’ film โ”€โ†’ voice + cut โ”€โ†’ publish โ”€โ”
        โ–ฒ                                                                                 โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  grounded on the brand dossier  โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Prefer to drive it by hand? The same brain runs from the Agent Loop tab in the UI, or from the terminal:

python -m app.agent list                                  # projects + dossier status
python -m app.agent run la-house-1 --format reel          # research โ†’ brand โ†’ brief
python -m app.agent assume la-house-1 price "$245/night"  # lock a consistent fact
python -m app.agent generate la-house-1 --format youtube  # render via the live API

Formats: reel

, tiktok

, shorts

, story

, snap

(9:16), youtube

(16:9), square

(1:1), portrait

(4:5). The render canvas switches automatically.

Requires Python 3.9+ and FFmpeg.

cd LiveHere

brew install ffmpeg                 # one time

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env                # add your keys (OpenAI, Tavily, Cosmos)

python -m app                       # โ†’ http://127.0.0.1:8000

Open http://127.0.0.1:8000, pick a listing, tweak the auto-filled details, and hit Generate. With no GPU configured it runs on the free local FFmpeg stub; point it at Cosmos for the real thing (below).

The generation backend is swapped purely via env vars โ€” no code change to the UI or pipeline.

LIVEHERE_BACKEND=cosmos
COSMOS_API_STYLE=vllm_omni
COSMOS_BASE_URL=http://<your-gpu-host>:8000/v1
COSMOS_API_KEY=...

We self-hosted it on a Nebius H200 NVLink instance with vLLM-Omni:

vllm serve nvidia/Cosmos3-Nano --omni --host 0.0.0.0 --port 8000 --no-guardrails

Full deploy walkthrough (Nebius / Modal / RunPod) is in deploy/DEPLOY.md. Cosmos can't run on Apple Silicon โ€” keep the GPU instance up only while generating, and tear it down when idle.

Cosmos Claw ยท made with โ˜• for the Yacht Hackathon ยท Composio ร— Nebius ร— Tavily

โ”€โ”€ more in #generative-ai 4 stories ยท sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain โ€” perfect for shipping the agent you just read about.

$git push zahid main
โ†’ Live at https://your-agent.zahid.host โœ“
Get free account โ†’ Pricing
from โ‚ฌ0/mo ยท no card required
LIVE [news/cosmos-claw-hack-on-โ€ฆ] indexed:0 read:7min 2026-06-14 ยท โ€”