{"slug": "deploy-and-inference-any-model-from-huggingface", "title": "Deploy and inference any model from HuggingFace", "summary": "Netflix released void-model on Hugging Face, and a developer used the Goose CLI agent with Together's dedicated containers skill to deploy the model for inference on release day with a single prompt. The agent automatically generated container configuration files and inference server setup, eliminating the typical multi-day lag between discovering a new model and running it. This demonstrates how AI agents can bridge technical knowledge gaps, making model deployment accessible without requiring deep expertise in containerization or inference server configuration.", "body_md": "Something real is shifting in how developers work. Agents open up work that used to be off-limits, not because it was technically impossible, but because it required niche expertise most of us didn't have. Containerization, inference server configs, model-specific environment setup: these are the kinds of tasks that used to demand either deep expertise or hours of self-education before you could even get started. Agents allow for an elegant way to bridge those pre-requisite knowledge gaps. You describe what you want, and the agent fills in the knowledge gaps.\n\nThat's the unlock. Not speed. *Access.*\n\n## The day Netflix dropped a new model\n\nNetflix recently released [void-model](https://huggingface.co/netflix/void-model) on Hugging Face. The day it came out, my instinct was the same as always: I want to try this. But wanting to try a new model and actually *running* it are two different things. Getting it into a usable environment, handling the inference server setup, figuring out the container configuration, wiring it all up correctly: that's the part that usually introduces a day or two of lag between \"this looks cool\" and \"okay I'm actually using it.\"\n\nThis time, that lag was basically zero.\n\nUsing [Goose](https://goose-docs.ai/), a CLI agent runner, combined with Together's [dedicated containers skill](https://github.com/togethercomputer/skills/tree/main/skills/together-dedicated-containers), I went from \"Netflix just dropped a model\" to \"I have a running container for it\" in a single session. The agent produced all the code needed to deploy void-model on Together's Dedicated Container Inference (DCI) infrastructure, essentially on release day.\n\nThe output lives here: [github.com/blainekasten/together-void-model-container](https://github.com/blainekasten/together-void-model-container)\n\n## Exactly what I did\n\nThe whole setup took three steps.\n\n**Step 1: Install the Together dedicated containers skill.**\n\n`npx skills add togethercomputer/skills`\n\nThat pulls in the [together-dedicated-containers skill](https://github.com/togethercomputer/skills/tree/main/skills/together-dedicated-containers), which gives Goose the specific knowledge it needs to work with Together's infrastructure: how to configure the inference server, what the container spec should look like, how to wire everything up for a given model.\n\n**Step 2: Start a Goose session and run one prompt.**\n\n`I want to deploy this model on togethers dedicated containers https://huggingface.co/netflix/void-model`\n\nThat's it. One sentence.\n\n**Step 3: Sit back and watch it work.**\n\nFrom there, the agent pulled the model details from Hugging Face, figured out the right inference server configuration for the model architecture, generated the container config files, and produced a complete, runnable setup, all without me having to look anything up or guide it through individual steps.\n\nThe result: [blainekasten/together-void-model-container,](https://github.com/blainekasten/together-void-model-container) a clean, working repo anyone can use to run void-model on Together infrastructure.\n\n**Step 4: Use your model!**\n\nAfter the agent deploys your application you can start running inference against it. The [Together CLI ](http://docs.together.ai/reference/cli)has commands to easily test inference.\n\nThis model removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed.\n\nOur inference calls with this model are asynchronous. Therefore the response of this request will return a payload with an identifier we can poll for. The response looks like this:\n\nWhen the inference completes, the outputs includes a URL to the hosted video. We can download it using cURL and our Together API key:\n\nNote: -L is required to follow the http redirect in the storage url and -O will write the output to a local file.\n\n## Why Together Dedicated Container Inference\n\nThis story only works because Together's Dedicated Container Inference (DCI) is genuinely a great place to run models like this, and it's worth explaining why.\n\nDCI gives you a private, GPU-backed environment running the model of your choice, fully managed by Together. You're not fighting for shared resources, you're not configuring your own cluster, and you're not locked into a fixed menu of available models. You bring the model; Together handles the infrastructure.\n\nThis is a big deal for teams that want to move fast. When a new model drops from Netflix, from a research lab, from the open-source community, you can have it running in a production-grade environment almost immediately. No spinning up your own GPU VMs, no wrestling with inference server dependencies, no waiting for someone to add support for it in a managed endpoint. DCI is flexible by design: if the model exists, you can deploy it.\n\nThe cost model also makes it easy to experiment. You're paying for what you use, on a container that's yours, without the overhead of managing the underlying compute. That's the kind of setup that lets you say yes to testing new models instead of filing it away for \"when I have time.\"\n\nIf you're interested in Together's DCI, [reach out to us](https://www.together.ai/contact-sales?dci=true) to get set up.", "url": "https://wpnews.pro/news/deploy-and-inference-any-model-from-huggingface", "canonical_source": "https://www.together.ai/blog/deploy-and-inference-any-model-from-huggingface", "published_at": "2026-05-08 00:00:00+00:00", "updated_at": "2026-05-25 00:21:40.320890+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "machine-learning", "large-language-models"], "entities": ["Netflix", "Hugging Face", "Goose", "Together", "void-model"], "alternates": {"html": "https://wpnews.pro/news/deploy-and-inference-any-model-from-huggingface", "markdown": "https://wpnews.pro/news/deploy-and-inference-any-model-from-huggingface.md", "text": "https://wpnews.pro/news/deploy-and-inference-any-model-from-huggingface.txt", "jsonld": "https://wpnews.pro/news/deploy-and-inference-any-model-from-huggingface.jsonld"}}