# How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces

> Source: <https://huggingface.co/blog/mishig/spaces-agents-md>
> Published: 2026-06-09 10:46:19+00:00

👁 192

#### TripoSplat

Generate 3D Gaussian models from a single image

I asked a coding agent to build a beautiful website showcasing the monuments of
Paris as 3D Gaussian splats. I never opened an image generator. I never touched a
3D reconstruction tool. The agent produced every asset (the images **and** the 3D
splats) by calling two Hugging Face Spaces directly, then wired them into a
cinematic viewer.

Here's the result, live as a static Space:

This post is about *how* that's possible now, and why I think it's a preview of
how a lot of multimedia software gets built from here on.

Mitchell Hashimoto recently described a shift he calls the
[building block economy](https://mitchellh.com/writing/building-block-economy):
the most effective path to software is no longer a polished monolith, but small,
well-documented components that others (increasingly *agents*) can assemble.
His key observation: AI is okay at building everything from scratch, but it is
**really good at gluing together** proven pieces.

That thesis has mostly been told with *code* libraries. But the same forces are
hitting **multimedia AI**. The hard part of using a state-of-the-art image model,
a video model, a TTS model, or a 3D reconstruction model was never the model. It
was the integration: SDKs, weights, GPUs, input formats, polling. If each model
were instead a documented, callable block, an agent could glue them together the
same way it globs together npm packages.

That's exactly what Hugging Face Spaces have quietly become.

`agents.md`

The Hub hosts thousands of state-of-the-art models (a huge share of them
**open-weights**), and most are deployed as interactive **Spaces**. As of now,
every Gradio Space also exposes a plain-text
[ agents.md](https://huggingface.co/docs/hub/en/spaces-agents) that tells an agent

```
curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md
```

returns everything needed in one shot: the schema URL, the call and poll templates, how to upload files, and the auth hint:

```
API schema:   GET  .../gradio_api/info
Call endpoint: POST .../gradio_api/call/v2/{endpoint} {"param_name": value, ...}
Poll result:  GET  .../gradio_api/call/{endpoint}/{event_id}
File inputs:  POST .../gradio_api/upload -F "files=@file.ext"
Auth:         Bearer $HF_TOKEN
```

No client library. No hardcoded integration. An agent reads that, and it can drive
the Space end to end. Set an [ HF_TOKEN](https://huggingface.co/settings/tokens)
and you're going. You can find these instructions on any Gradio Space via its

The real unlock is **chaining**: the output of one Space becomes the input to the
next. Prompt → image → 3D. That's the whole pipeline behind this gallery.

The agent chained two Spaces:

`VAST-AI/TripoSplat`

`.ply`

) from each single image. Image in,
3D out.Generated image

Reconstructed splat

The six source images the agent generated, all isolated on black, ready for single-image 3D reconstruction:

From there the agent did the "glue" work too. It noticed TripoSplat outputs are
Y-down and flipped them upright, auto-framed each monument, compressed the `.ply`

files to `.ksplat`

(~3× smaller, so they load fast), built a Three.js viewer with a
scroll-to-switch and drag-to-rotate UI, and deployed the whole thing as a static
Space. The only human inputs were taste-level: "make it zoomed out," "replace the
obelisk with something better for splatting," "the transition lingers too long."

Several of those steps were **the agent reacting to reality**. A wide glass pyramid
splats poorly. A thin obelisk is dull. A single-view reconstruction infers the
back. That is exactly the "outsourced R&D, fast iteration" loop the building-block
economy predicts, except the R&D was a conversation.

The real test of a building block is how cheaply you can reuse it. Once this pipeline existed, spinning up entirely new galleries cost about one sentence each. "Create a similar Space with splats for Japan," then the same for Egypt, and the agent did the rest: six monument images, six splats, compression, a viewer, and a deployed Space, per country.

<video autoplay loop muted playsinline width="100%" src="

"><video autoplay loop muted playsinline width="100%" src="

">Same two Spaces, same `agents.md`

, only the prompts changed. That is the
building-block economy in one line: the marginal cost of a new multimedia app
falls toward the cost of describing it.

`agents.md`

makes a Space
trivially reachable, so an agent will pick it over a model it has to set up by
hand. That is the same dynamic Hashimoto flags for open-source libraries.Point your own agent at a Space's `agents.md`

and let it cook:

```
# image generation
curl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md
# single-image to 3D gaussian splat
curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md
```

Paste either link into your coding agent (Claude Code, etc.), set your
`HF_TOKEN`

, and ask it to build something. The full, reproducible pipeline for this
gallery, the scripts that hit those two `agents.md`

endpoints, lives in the
[Space repo](https://huggingface.co/spaces/mishig/monuments-de-paris/tree/main).

The building blocks are sitting right there on the Hub. The agents already know how to glue.

I asked a coding agent to build a beautiful website showcasing the monuments of
Paris as 3D Gaussian splats. I never opened an image generator. I never touched a
3D reconstruction tool. The agent produced every asset (the images **and** the 3D
splats) by calling two Hugging Face Spaces directly, then wired them into a
cinematic viewer.

Here's the result, live as a static Space:

This post is about *how* that's possible now, and why I think it's a preview of
how a lot of multimedia software gets built from here on.

Mitchell Hashimoto recently described a shift he calls the
[building block economy](https://mitchellh.com/writing/building-block-economy):
the most effective path to software is no longer a polished monolith, but small,
well-documented components that others (increasingly *agents*) can assemble.
His key observation: AI is okay at building everything from scratch, but it is
**really good at gluing together** proven pieces.

That thesis has mostly been told with *code* libraries. But the same forces are
hitting **multimedia AI**. The hard part of using a state-of-the-art image model,
a video model, a TTS model, or a 3D reconstruction model was never the model. It
was the integration: SDKs, weights, GPUs, input formats, polling. If each model
were instead a documented, callable block, an agent could glue them together the
same way it globs together npm packages.

That's exactly what Hugging Face Spaces have quietly become.

`agents.md`

The Hub hosts thousands of state-of-the-art models (a huge share of them
**open-weights**), and most are deployed as interactive **Spaces**. As of now,
every Gradio Space also exposes a plain-text
[ agents.md](https://huggingface.co/docs/hub/en/spaces-agents) that tells an agent

```
curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md
```

returns everything needed in one shot: the schema URL, the call and poll templates, how to upload files, and the auth hint:

```
API schema:   GET  .../gradio_api/info
Call endpoint: POST .../gradio_api/call/v2/{endpoint} {"param_name": value, ...}
Poll result:  GET  .../gradio_api/call/{endpoint}/{event_id}
File inputs:  POST .../gradio_api/upload -F "files=@file.ext"
Auth:         Bearer $HF_TOKEN
```

No client library. No hardcoded integration. An agent reads that, and it can drive
the Space end to end. Set an [ HF_TOKEN](https://huggingface.co/settings/tokens)
and you're going.

The real unlock is **chaining**: the output of one Space becomes the input to the
next. Prompt → image → 3D. That's the whole pipeline behind this gallery.

The agent chained two Spaces:

`ideogram-ai/ideogram4`

`VAST-AI/TripoSplat`

`.ply`

) from each single image. Image in,
3D out.Generated image

Reconstructed splat

The six source images the agent generated, all isolated on black, ready for single-image 3D reconstruction:

From there the agent did the "glue" work too. It noticed TripoSplat outputs are
Y-down and flipped them upright, auto-framed each monument, compressed the `.ply`

files to `.ksplat`

(~3× smaller, so they load fast), built a Three.js viewer with a
scroll-to-switch and drag-to-rotate UI, and deployed the whole thing as a static
Space. The only human inputs were taste-level: "make it zoomed out," "replace the
obelisk with something better for splatting," "the transition lingers too long."

Several of those steps were **the agent reacting to reality**. A wide glass pyramid
splats poorly. A thin obelisk is dull. A single-view reconstruction infers the
back. That is exactly the "outsourced R&D, fast iteration" loop the building-block
economy predicts, except the R&D was a conversation.

The real test of a building block is how cheaply you can reuse it. Once this pipeline existed, spinning up entirely new galleries cost about one sentence each. "Create a similar Space with splats for Japan," then the same for Egypt, and the agent did the rest: six monument images, six splats, compression, a viewer, and a deployed Space, per country.

Same two Spaces, same `agents.md`

, only the prompts changed. That is the
building-block economy in one line: the marginal cost of a new multimedia app
falls toward the cost of describing it.

`agents.md`

makes a Space
trivially reachable, so an agent will pick it over a model it has to set up by
hand. That is the same dynamic Hashimoto flags for open-source libraries.Point your own agent at a Space's `agents.md`

and let it cook:

```
# image generation
curl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md
# single-image to 3D gaussian splat
curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md
```

Paste either link into your coding agent (Claude Code, etc.), set your
`HF_TOKEN`

, and ask it to build something. The full, reproducible pipeline for this
gallery, the scripts that hit those two `agents.md`

endpoints, lives in the
[Space repo](https://huggingface.co/spaces/mishig/monuments-de-paris/tree/main).

The building blocks are sitting right there on the Hub. The agents already know how to glue.

Generate 3D Gaussian models from a single image

Ideogram 4 state of the art open weights

Explore interactive 3D models of Paris monuments

Explore 3D Egyptian monuments with interactive rotation

Explore 3D captures of Japan’s famous monuments