# Show HN: Brontosaurus, a voice-driven generative AI canvas

> Source: <https://thomasdhughes.com/brontosaurus/>
> Published: 2026-06-03 00:12:46+00:00

*Click for audio.*

Brontosaurus is a web-based generative canvas, where you speak aloud what you want to see, and Bronto builds a widget of it in under a second. The underlying agents run on OpenAI’s `gpt-oss-120b`

, served by Cerebras at a blistering 3,000 tokens/second, which makes the whole thing feel like magic :)

*Brontosaurus is very much in development - if you’ve got ideas or things you’d want to see, send them my way, I’d love to hear them! ( hello@thomasdhughes.com)*

Contents:

- The Inspiration
- The Technical
- The Future
- Acknowledgments

### The Inspiration

Two phenomenal blog posts inspired this project.

The first was Thinking Machines’ release of an [Interaction Model](https://thinkingmachines.ai/blog/interaction-models/).

On the technical side, I liked the architecture of a multi-modal (voice+vision+text) model hooked into a more powerful quasi-background agent which quietly executes requests without interrupting the flow of conversation. There are certainly other labs doing this today, but Thinking Machines takes a unique approach - Sean Goedecke does a great job of breaking down that novelty in [his own blog post](https://www.seangoedecke.com/interaction-models/).

On the philosophical side, Thinking Machines argues that current discussions around AI agents mistakenly center agentic autonomy (the ability for an agent to receive a task, then work for hours uninterrupted) as opposed to human-AI collaboration (the ability for an agent to work on a task in tandem with a person). They assert that this is a mistake - that designing models this way leads to “humans increasingly get[ing] pushed out not because the work doesn’t need them, but because the interface has no room for them.” You put in your prompt and get out of the way. This new family of Interaction Models, on the flip side, operates more in the way you’d work with a teammate - you can talk, type, point at things, interrupt with new ideas. You can *collaborate*.

Yes to this!

Both the technical and philosophical pieces of this approach spoke to me, and they together made me want to build something of the same ethos. In Bronto, that manifests as prioritizing **creation at the speed of thought** above all else. It is an argument that more than capability, more than intelligence, more than long-running task autonomy, the ability to speak something into existence in under a second makes you feel like anything is possible.

The [second blog post](https://www.inkandswitch.com/patchwork/notebook/chitter-chatter/) was from Ink & Switch, an independent research lab with a focus on malleable software, a concept I explored in [Modifying Websites with LLM-Generated Javascript Bookmarks](https://thomasdhughes.com/bookmark/). The post is called “chitter chatter”, and it outlines a vision for a generative canvas. It reads like somebody’s diary, and makes the proposed software sound friendly and warm, which I always admire when people can do - code is too often cold and numbers. I will not apologize for that phrasing because I am prideful, but I do not stand by it let us not mention it going forward.

I read this piece (along with Thinking Machines’), loved it, and wanted to build something like it. Brontosaurus was born.

### The Technical

Under the hood, there’s some multi-agent orchestration going on.

There are two agent types at play: Conductor and Builder. Both run on [OpenAI’s gpt-oss-120b](https://openai.com/index/introducing-gpt-oss/), served by

[Cerebras](https://inference-docs.cerebras.ai/models/openai-oss)at 3,000 tokens/second. To put this in perspective, ChatGPT in the browser responds at ~50 tokens/second.

When you tap the space bar, the web app starts listening, and when you tap again, it does speech-to-text with Chrome’s built-in Web Speech API.

The text of what you said gets passed to the Conductor agent, along with a JSON array describing the widgets currently on the canvas, each of which have

- an
`id`

(unique identifier for tool calls), - a
`title`

(what you see at the top of each widget), - a
`description`

(internal-facing explanation of what the widget contains), and - a
`rect`

(the widget’s size and position on the canvas).

The Conductor agent then makes tool calls. It can

`arrange`

- move or resize a widget by`id`

, without changing the contents (this is done by updating the`rect`

value for that widget),`delete`

- remove a widget by`id`

,`clear`

- remove all widgets at once,`create`

and`edit`

.

The first three tool calls are handled deterministically. The final two - `create`

and `edit`

- send instructions to a Builder agent.

When `create`

ing, the Builder agent receives just the requested widget’s `description`

. When `edit`

ing, it receives the full HTML of the current widget along with the change instructions.

The Builder agent then returns a complete, self-contained HTML document, which is subsequently cleaned and rendered in an `iframe`

.

Additional design choices which add to the magic:

- The Conductor agent can make multiple tool calls with a single instruction - this makes it possible to say “delete the piano, make a calculator, put it where the piano was” and it all happens at once.
`arrange`

calls don’t need to wait for the Builder agent to finish building - as soon as the`create`

is run, an`id`

exists, so widgets which are still populating can be moved around.- Builder agents run in parallel so multiple widgets can be made at once.

### The Future

There’s lots of room for improvement here.

For one, `gpt-oss-120b`

is 9 months old and just 120B parameters. This means it’s dirt cheap - I used Brontosaurus nonstop for over an hour and spent less than a dollar - and that the ceiling for quality of output is way higher. If we used a model like `GLM 4.7`

from Z.ai (also served by Cerebras), it’d be 4x the cost and 1/3rd the speed, but 3x the parameters so could likely build far more complex widgets. The question is if the speed tradeoff would be worth it.

On this note, I initially added live search capabilities via Exa AI so that Bronto could pull things like weather and live stock price, but in my evals it added a delay of ~0.9s, which stings once you’ve gotten used to the sub-second generation speed.

Finally, the major one - a virtual file system! At the moment, all the widgets Bronto makes are ephemeral, single-use HTML files in `iframe`

s, but a VFS would allow for surfacing pre-existing documents and previously-built widgets to iterate on, as well as make it possible for Bronto to selectively pull the contents of widgets into its context window, so commands like “I checked off what I already have from the ingredients list, please remove those” would work.

The above are all technical changes. But I know there is a lot of interesting things that can be done just with the current architecture. The 8 row step sequencer (beat maker) at the end of the demo initially came from me saying “I want to make some music” and Bronto produced three widgets, that being one of them. It totally blew my mind. Which is why I put at the top: *if you’ve got ideas, please send them my way! ( hello@thomasdhughes.com)* I’ll try them out and send back a video I promise.

Goodbye!

-Thomas

*Thank you Thinking Machines and Ink & Switch for inspiring me with your work and writing.*

*Thank you Tristan and Steven for pressure testing early versions of Brontosaurus with requests I hope it never fulfills.*
