{"slug": "how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces", "title": "How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces", "summary": "A coding agent built a 3D Paris gallery by chaining two Hugging Face Spaces — one for image generation and one for 3D reconstruction — without any human intervention in asset creation. The agent used the output of the first Space as input for the second, then assembled the results into a cinematic Three.js viewer. The demonstration shows how AI agents can now assemble multimedia software by gluing together documented, callable model blocks rather than building from scratch.", "body_md": "👁 192\n\n#### TripoSplat\n\nGenerate 3D Gaussian models from a single image\n\nI asked a coding agent to build a beautiful website showcasing the monuments of\nParis as 3D Gaussian splats. I never opened an image generator. I never touched a\n3D reconstruction tool. The agent produced every asset (the images **and** the 3D\nsplats) by calling two Hugging Face Spaces directly, then wired them into a\ncinematic viewer.\n\nHere's the result, live as a static Space:\n\nThis post is about *how* that's possible now, and why I think it's a preview of\nhow a lot of multimedia software gets built from here on.\n\nMitchell Hashimoto recently described a shift he calls the\n[building block economy](https://mitchellh.com/writing/building-block-economy):\nthe most effective path to software is no longer a polished monolith, but small,\nwell-documented components that others (increasingly *agents*) can assemble.\nHis key observation: AI is okay at building everything from scratch, but it is\n**really good at gluing together** proven pieces.\n\nThat thesis has mostly been told with *code* libraries. But the same forces are\nhitting **multimedia AI**. The hard part of using a state-of-the-art image model,\na video model, a TTS model, or a 3D reconstruction model was never the model. It\nwas the integration: SDKs, weights, GPUs, input formats, polling. If each model\nwere instead a documented, callable block, an agent could glue them together the\nsame way it globs together npm packages.\n\nThat's exactly what Hugging Face Spaces have quietly become.\n\n`agents.md`\n\nThe Hub hosts thousands of state-of-the-art models (a huge share of them\n**open-weights**), and most are deployed as interactive **Spaces**. As of now,\nevery Gradio Space also exposes a plain-text\n[ agents.md](https://huggingface.co/docs/hub/en/spaces-agents) that tells an agent\n\n```\ncurl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md\n```\n\nreturns everything needed in one shot: the schema URL, the call and poll templates, how to upload files, and the auth hint:\n\n```\nAPI schema:   GET  .../gradio_api/info\nCall endpoint: POST .../gradio_api/call/v2/{endpoint} {\"param_name\": value, ...}\nPoll result:  GET  .../gradio_api/call/{endpoint}/{event_id}\nFile inputs:  POST .../gradio_api/upload -F \"files=@file.ext\"\nAuth:         Bearer $HF_TOKEN\n```\n\nNo client library. No hardcoded integration. An agent reads that, and it can drive\nthe Space end to end. Set an [ HF_TOKEN](https://huggingface.co/settings/tokens)\nand you're going. You can find these instructions on any Gradio Space via its\n\nThe real unlock is **chaining**: the output of one Space becomes the input to the\nnext. Prompt → image → 3D. That's the whole pipeline behind this gallery.\n\nThe agent chained two Spaces:\n\n`VAST-AI/TripoSplat`\n\n`.ply`\n\n) from each single image. Image in,\n3D out.Generated image\n\nReconstructed splat\n\nThe six source images the agent generated, all isolated on black, ready for single-image 3D reconstruction:\n\nFrom there the agent did the \"glue\" work too. It noticed TripoSplat outputs are\nY-down and flipped them upright, auto-framed each monument, compressed the `.ply`\n\nfiles to `.ksplat`\n\n(~3× smaller, so they load fast), built a Three.js viewer with a\nscroll-to-switch and drag-to-rotate UI, and deployed the whole thing as a static\nSpace. The only human inputs were taste-level: \"make it zoomed out,\" \"replace the\nobelisk with something better for splatting,\" \"the transition lingers too long.\"\n\nSeveral of those steps were **the agent reacting to reality**. A wide glass pyramid\nsplats poorly. A thin obelisk is dull. A single-view reconstruction infers the\nback. That is exactly the \"outsourced R&D, fast iteration\" loop the building-block\neconomy predicts, except the R&D was a conversation.\n\nThe real test of a building block is how cheaply you can reuse it. Once this pipeline existed, spinning up entirely new galleries cost about one sentence each. \"Create a similar Space with splats for Japan,\" then the same for Egypt, and the agent did the rest: six monument images, six splats, compression, a viewer, and a deployed Space, per country.\n\n<video autoplay loop muted playsinline width=\"100%\" src=\"\n\n\"><video autoplay loop muted playsinline width=\"100%\" src=\"\n\n\">Same two Spaces, same `agents.md`\n\n, only the prompts changed. That is the\nbuilding-block economy in one line: the marginal cost of a new multimedia app\nfalls toward the cost of describing it.\n\n`agents.md`\n\nmakes a Space\ntrivially reachable, so an agent will pick it over a model it has to set up by\nhand. That is the same dynamic Hashimoto flags for open-source libraries.Point your own agent at a Space's `agents.md`\n\nand let it cook:\n\n```\n# image generation\ncurl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md\n# single-image to 3D gaussian splat\ncurl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md\n```\n\nPaste either link into your coding agent (Claude Code, etc.), set your\n`HF_TOKEN`\n\n, and ask it to build something. The full, reproducible pipeline for this\ngallery, the scripts that hit those two `agents.md`\n\nendpoints, lives in the\n[Space repo](https://huggingface.co/spaces/mishig/monuments-de-paris/tree/main).\n\nThe building blocks are sitting right there on the Hub. The agents already know how to glue.\n\nI asked a coding agent to build a beautiful website showcasing the monuments of\nParis as 3D Gaussian splats. I never opened an image generator. I never touched a\n3D reconstruction tool. The agent produced every asset (the images **and** the 3D\nsplats) by calling two Hugging Face Spaces directly, then wired them into a\ncinematic viewer.\n\nHere's the result, live as a static Space:\n\nThis post is about *how* that's possible now, and why I think it's a preview of\nhow a lot of multimedia software gets built from here on.\n\nMitchell Hashimoto recently described a shift he calls the\n[building block economy](https://mitchellh.com/writing/building-block-economy):\nthe most effective path to software is no longer a polished monolith, but small,\nwell-documented components that others (increasingly *agents*) can assemble.\nHis key observation: AI is okay at building everything from scratch, but it is\n**really good at gluing together** proven pieces.\n\nThat thesis has mostly been told with *code* libraries. But the same forces are\nhitting **multimedia AI**. The hard part of using a state-of-the-art image model,\na video model, a TTS model, or a 3D reconstruction model was never the model. It\nwas the integration: SDKs, weights, GPUs, input formats, polling. If each model\nwere instead a documented, callable block, an agent could glue them together the\nsame way it globs together npm packages.\n\nThat's exactly what Hugging Face Spaces have quietly become.\n\n`agents.md`\n\nThe Hub hosts thousands of state-of-the-art models (a huge share of them\n**open-weights**), and most are deployed as interactive **Spaces**. As of now,\nevery Gradio Space also exposes a plain-text\n[ agents.md](https://huggingface.co/docs/hub/en/spaces-agents) that tells an agent\n\n```\ncurl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md\n```\n\nreturns everything needed in one shot: the schema URL, the call and poll templates, how to upload files, and the auth hint:\n\n```\nAPI schema:   GET  .../gradio_api/info\nCall endpoint: POST .../gradio_api/call/v2/{endpoint} {\"param_name\": value, ...}\nPoll result:  GET  .../gradio_api/call/{endpoint}/{event_id}\nFile inputs:  POST .../gradio_api/upload -F \"files=@file.ext\"\nAuth:         Bearer $HF_TOKEN\n```\n\nNo client library. No hardcoded integration. An agent reads that, and it can drive\nthe Space end to end. Set an [ HF_TOKEN](https://huggingface.co/settings/tokens)\nand you're going.\n\nThe real unlock is **chaining**: the output of one Space becomes the input to the\nnext. Prompt → image → 3D. That's the whole pipeline behind this gallery.\n\nThe agent chained two Spaces:\n\n`ideogram-ai/ideogram4`\n\n`VAST-AI/TripoSplat`\n\n`.ply`\n\n) from each single image. Image in,\n3D out.Generated image\n\nReconstructed splat\n\nThe six source images the agent generated, all isolated on black, ready for single-image 3D reconstruction:\n\nFrom there the agent did the \"glue\" work too. It noticed TripoSplat outputs are\nY-down and flipped them upright, auto-framed each monument, compressed the `.ply`\n\nfiles to `.ksplat`\n\n(~3× smaller, so they load fast), built a Three.js viewer with a\nscroll-to-switch and drag-to-rotate UI, and deployed the whole thing as a static\nSpace. The only human inputs were taste-level: \"make it zoomed out,\" \"replace the\nobelisk with something better for splatting,\" \"the transition lingers too long.\"\n\nSeveral of those steps were **the agent reacting to reality**. A wide glass pyramid\nsplats poorly. A thin obelisk is dull. A single-view reconstruction infers the\nback. That is exactly the \"outsourced R&D, fast iteration\" loop the building-block\neconomy predicts, except the R&D was a conversation.\n\nThe real test of a building block is how cheaply you can reuse it. Once this pipeline existed, spinning up entirely new galleries cost about one sentence each. \"Create a similar Space with splats for Japan,\" then the same for Egypt, and the agent did the rest: six monument images, six splats, compression, a viewer, and a deployed Space, per country.\n\nSame two Spaces, same `agents.md`\n\n, only the prompts changed. That is the\nbuilding-block economy in one line: the marginal cost of a new multimedia app\nfalls toward the cost of describing it.\n\n`agents.md`\n\nmakes a Space\ntrivially reachable, so an agent will pick it over a model it has to set up by\nhand. That is the same dynamic Hashimoto flags for open-source libraries.Point your own agent at a Space's `agents.md`\n\nand let it cook:\n\n```\n# image generation\ncurl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md\n# single-image to 3D gaussian splat\ncurl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md\n```\n\nPaste either link into your coding agent (Claude Code, etc.), set your\n`HF_TOKEN`\n\n, and ask it to build something. The full, reproducible pipeline for this\ngallery, the scripts that hit those two `agents.md`\n\nendpoints, lives in the\n[Space repo](https://huggingface.co/spaces/mishig/monuments-de-paris/tree/main).\n\nThe building blocks are sitting right there on the Hub. The agents already know how to glue.\n\nGenerate 3D Gaussian models from a single image\n\nIdeogram 4 state of the art open weights\n\nExplore interactive 3D models of Paris monuments\n\nExplore 3D Egyptian monuments with interactive rotation\n\nExplore 3D captures of Japan’s famous monuments", "url": "https://wpnews.pro/news/how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces", "canonical_source": "https://huggingface.co/blog/mishig/spaces-agents-md", "published_at": "2026-06-09 10:46:19+00:00", "updated_at": "2026-06-11 21:43:35.314643+00:00", "lang": "en", "topics": ["ai-agents", "generative-ai", "ai-tools", "ai-products", "computer-vision"], "entities": ["Hugging Face", "Mitchell Hashimoto", "TripoSplat", "Paris"], "alternates": {"html": "https://wpnews.pro/news/how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces", "markdown": "https://wpnews.pro/news/how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces.md", "text": "https://wpnews.pro/news/how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces.txt", "jsonld": "https://wpnews.pro/news/how-an-agent-built-a-3d-paris-gallery-by-chaining-two-hugging-face-spaces.jsonld"}}