cd /news/artificial-intelligence/large-language-models-are-overkill-f… · home topics artificial-intelligence article
[ARTICLE · art-38888] src=adexchanger.com ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Large Language Models Are Overkill For Some Marketing Tasks. Enter The Small Language Model

AI company ZeroGPU launched specialized small language models for ad tech on Thursday, aiming to help companies handle high-volume workflows faster and at lower cost. The SLMs, which run on cheaper CPUs instead of GPUs, have already reduced expenses by 50% for customer Dappier. The move comes as large language models become increasingly expensive for repetitive marketing tasks.

read5 min views1 publishedJun 25, 2026
Large Language Models Are Overkill For Some Marketing Tasks. Enter The Small Language Model
Image: Adexchanger (auto-discovered)

It’s no secret that large language models (LLMs) have gotten exorbitantly expensive.

Companies are starting to limit their employees’ AI usage to save money; OpenAI has even discussed lowering the cost of tokens to retain financially-anxious customers.

But you know what’s cheaper than a large language model? A small language model.

AI company ZeroGPU develops small language models (SLMs) that are trained on a smaller amount of data and designed to perform specialized tasks and use cases.

On Thursday, the company announced a group of specialized SLMs for ad tech, with the goal of helping tech companies handle high-volume workflows more quickly and at a lower cost.

Work smaller, not harder

LLMs have “trillions and trillions of parameters,” and they’re trained on the entirety of the internet, said Maddy Arvapally, founder and CEO of ZeroGPU.

But for a lot of repetitive ad tech tasks, like content classification or document summaries, a much smaller model with fewer than 10 billion parameters is enough to get the job done, she said. Plus, using an SLM is cheaper and faster than having an LLM perform the same task.

Because of the sheer amount of data processing they require, LLMs rely on high-powered graphics processing units (GPUs) to feed them data and constantly revise the model’s parameters. But enterprise-grade GPUs are expensive, so many tech companies rent access to these GPUs from cloud infrastructure providers. Either way, the GPU costs get passed down to the end user.

ZeroGPU’s smaller models, however, run on central processing units, which are cheaper and handle tasks one at a time. They can also run on browsers.

Because SLMs carry lower processing costs, AI monetization company Dappier has seen a 50% decrease in its overall expenses since adopting ZeroGPU’s models, according to Co-Founder and CEO Dan Goikhman.

Dappier provides on- and off-site AI agents for marketers (basically, brand-specific chatbots) that are trained on their brand tone and guidelines. It also licenses publisher data for training AI tools.

So far, Dappier has adopted three of ZeroGPU’s SLMs: one for content classification, one for intent classification and one for moderation (or brand safety).

Keeping up with the times

The marketing-specific agents Dappier creates need to be “super responsive” to customer queries and follow-ups, said Goikhman, including the ability to classify each conversation and extract the user’s “commercial intent.” The intent could be anything from seeking out a particular type of product or wondering how this brand stands out from its competitors.

Say, for instance, a user is reading an article on a parenting website about helping their child with their homework. Dappier can create a chatbot that can generate prompt suggestions to open a conversation with that user about parenting.

But the SLMs continue to analyze the context of the conversation and the user’s intent as the interaction evolves, said Goikhman. And this enables the agent to constantly map both the content the user is engaging with and the conversation they’re having with the chatbot to IAB contextual categories.

The goal is to show publishers and advertisers what the article and the resulting conversation are about, he said. That way, they can understand what conversational prompts to show to keep the user engaged and what sorts of advertisers might be interested in that page and the ensuing chatbot interactions.

The final frontier?

Changing the models that your AI tools run on sounds like a heck of an undertaking. But, ZeroGPU prioritizes easing the transition to SLMs for its clients.

Its models have “OpenAI-compatible endpoints,” said Arvapally, meaning that the only thing a client needs to do is swap out a URL on the backend so it calls ZeroGPU’s API rather than OpenAI’s.

According to Goikhman, the entire process took about five minutes.

Historically, Dappier was using the major frontier models like OpenAI and Claude to generate prompts and understand context. But the SLM’s results are still tailored to the customer’s needs, and the AI doesn’t lean on as many additional resources in its training.

For instance, Dappier tracks all of the conversations that consumers have within its chatbots to determine best practices for prompt suggestions. But its SLMs are trained just on these conversations and practices. Another of Dappier’s main use cases is classifying articles and conversations within IAB categories. The IAB content taxonomy includes more than 1,500 categories, and the SLM is trained on all of them.

But, if you ask the content classification model for sentiment analysis, it won’t be able to provide that, Arvapally said, “because that’s just not what it’s meant for.”

(Although, ZeroGPU has a separate model for sentiment analysis. In that sense, SLMs are kind of the pinnacle of “I know a guy” for any specific use case.)

Meanwhile, using an SLM to develop a tool for a certain task also makes executing that task faster than using a general-purpose LLM.

Dappier’s content classification model can churn out IAB classifications more quickly than a frontier model, which would be ingesting the query at the same time it was trying to process the taxonomies, Arvapally said.

And besides, she added, the frontier models have a high rate of hallucinations in categorization tasks, because many of the IAB categories have similar titles, and a model that isn’t specifically trained on them often conflates them.

But, because the SLM is already trained on the taxonomy, it only needs to ingest the content of the specific article it’s analyzing, which reduces the time between prompt and response.

The result, Arvapally said, is “sub-50 millisecond response times, which is impossible for frontier models.”

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @zerogpu 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/large-language-model…] indexed:0 read:5min 2026-06-25 ·