A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations A developer introduced Modelis, an OpenAI-compatible gateway that charges a flat per-call price for bounded-output tasks like summarization, classification, and extraction, making it suitable for high-volume automations in n8n and Make. The service caps output at ~1024 tokens and auto-routes requests to a fitting model, ensuring predictable costs. The developer also released an open-source adapter for local use. If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these bounded-output jobs , a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to. Automation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through Modelis https://rapidapi.com/chenxiao5580/api/modelis-auto-chat , an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a flat price per call with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size. It is a standard OpenAI-compatible POST /v1/chat/completions . Use an HTTP Request node: POST https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions x-rapidapi-host: modelis-auto-chat.p.rapidapi.com , x-rapidapi-key: YOUR KEY , content-type: application/json {"model":"modelis-auto","messages": {"role":"user","content":"Label sentiment positive/negative/neutral : {{ $json.text }}"} } The curl equivalent: curl --request POST \ --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \ --header 'content-type: application/json' \ --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \ --header 'x-rapidapi-key: YOUR KEY' \ --data '{"model":"modelis-auto","messages": {"role":"user","content":"Summarize in 2 sentences: ..."} }' If you would rather use a built-in OpenAI node that expects an Authorization: Bearer key and a custom base URL, run the tiny open-source adapter next to your workflow runner: npx modelis-openai local proxy on 127.0.0.1:8787, MIT, ~120 lines Then point the node at http://127.0.0.1:8787/v1 with model modelis-auto . Summarize in 2 sentences: ... Label sentiment positive/negative/neutral : ... Return JSON with {name, email, company} from: ... All produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about. Long-form generation articles, whole files, large code will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need. I built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.