# A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations

> Source: <https://dev.to/chenxiao5580cmd/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make-automations-548h>
> Published: 2026-06-28 13:00:34+00:00

If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these **bounded-output jobs**, a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to.

Automation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through [Modelis](https://rapidapi.com/chenxiao5580/api/modelis-auto-chat), an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a **flat price per call** with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size.

It is a standard OpenAI-compatible `POST /v1/chat/completions`

. Use an HTTP Request node:

`POST`

`https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions`

`x-rapidapi-host: modelis-auto-chat.p.rapidapi.com`

, `x-rapidapi-key: YOUR_KEY`

, `content-type: application/json`

```
{"model":"modelis-auto","messages":[{"role":"user","content":"Label sentiment (positive/negative/neutral): {{ $json.text }}"}]}
```

The curl equivalent:

```
curl --request POST \
  --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_KEY' \
  --data '{"model":"modelis-auto","messages":[{"role":"user","content":"Summarize in 2 sentences: ..."}]}'
```

If you would rather use a built-in OpenAI node that expects an `Authorization: Bearer`

key and a custom base URL, run the tiny open-source adapter next to your workflow runner:

```
npx modelis-openai      # local proxy on 127.0.0.1:8787, MIT, ~120 lines
```

Then point the node at `http://127.0.0.1:8787/v1`

with model `modelis-auto`

.

`Summarize in 2 sentences: ...`

`Label sentiment (positive/negative/neutral): ...`

`Return JSON with {name, email, company} from: ...`

All produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about.

Long-form generation (articles, whole files, large code) will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need.

I built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.