cd /news/developer-tools/a-flat-per-call-endpoint-for-summari… · home topics developer-tools article
[ARTICLE · art-42550] src=dev.to ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations

A developer introduced Modelis, an OpenAI-compatible gateway that charges a flat per-call price for bounded-output tasks like summarization, classification, and extraction, making it suitable for high-volume automations in n8n and Make. The service caps output at ~1024 tokens and auto-routes requests to a fitting model, ensuring predictable costs. The developer also released an open-source adapter for local use.

read2 min views1 publishedJun 28, 2026

If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these bounded-output jobs, a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to.

Automation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through Modelis, an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a flat price per call with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size.

It is a standard OpenAI-compatible POST /v1/chat/completions

. Use an HTTP Request node:

POST

https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions

x-rapidapi-host: modelis-auto-chat.p.rapidapi.com

, x-rapidapi-key: YOUR_KEY

, content-type: application/json

{"model":"modelis-auto","messages":[{"role":"user","content":"Label sentiment (positive/negative/neutral): {{ $json.text }}"}]}

The curl equivalent:

curl --request POST \
  --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_KEY' \
  --data '{"model":"modelis-auto","messages":[{"role":"user","content":"Summarize in 2 sentences: ..."}]}'

If you would rather use a built-in OpenAI node that expects an Authorization: Bearer

key and a custom base URL, run the tiny open-source adapter next to your workflow runner:

npx modelis-openai      # local proxy on 127.0.0.1:8787, MIT, ~120 lines

Then point the node at http://127.0.0.1:8787/v1

with model modelis-auto

.

Summarize in 2 sentences: ...

Label sentiment (positive/negative/neutral): ...

Return JSON with {name, email, company} from: ...

All produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about.

Long-form generation (articles, whole files, large code) will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need.

I built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.

── more in #developer-tools 4 stories · sorted by recency
── more on @modelis 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/a-flat-per-call-endp…] indexed:0 read:2min 2026-06-28 ·